faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
80 stars 49 forks source link

Illumiprocessor - naming issue #208

Closed MaxTBC closed 3 years ago

MaxTBC commented 4 years ago

This has been discussed previously and I've read those posts, but am still having the following error:

2020-11-19 12:36:17,403 - illumiprocessor - INFO - ==================== Starting illumiprocessor ===================
2020-11-19 12:36:17,404 - illumiprocessor - INFO - Version: 2.0.9
2020-11-19 12:36:17,404 - illumiprocessor - INFO - Argument --config: illumiprocessor_HS.conf
2020-11-19 12:36:17,404 - illumiprocessor - INFO - Argument --cores: 12
2020-11-19 12:36:17,404 - illumiprocessor - INFO - Argument --input: /home/tangled/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data/raw-fastq
2020-11-19 12:36:17,405 - illumiprocessor - INFO - Argument --log_path: /home/tangled/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data/log
2020-11-19 12:36:17,405 - illumiprocessor - INFO - Argument --min_len: 40
2020-11-19 12:36:17,405 - illumiprocessor - INFO - Argument --no_merge: False
2020-11-19 12:36:17,406 - illumiprocessor - INFO - Argument --output: /home/tangled/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data/clean-fastq
2020-11-19 12:36:17,406 - illumiprocessor - INFO - Argument --phred: phred33
2020-11-19 12:36:17,406 - illumiprocessor - INFO - Argument --r1_pattern: None
2020-11-19 12:36:17,406 - illumiprocessor - INFO - Argument --r2_pattern: None
2020-11-19 12:36:17,407 - illumiprocessor - INFO - Argument --se: False
2020-11-19 12:36:17,407 - illumiprocessor - INFO - Argument --trimmomatic: /home/tangled/miniconda2/envs/phyluce/bin/trimmomatic
2020-11-19 12:36:17,407 - illumiprocessor - INFO - Argument --verbosity: INFO
Traceback (most recent call last):
  File "/home/tangled/miniconda2/envs/phyluce/bin/illumiprocessor", line 17, in <module>
    sys.exit(main())
  File "/home/tangled/miniconda2/envs/phyluce/lib/python2.7/site-packages/illumiprocessor/cli/main.py", line 121, in main
    main(args)
  File "/home/tangled/miniconda2/envs/phyluce/lib/python2.7/site-packages/illumiprocessor/main.py", line 34, in main
    reads.append(core.SequenceData(args, conf, start_name, end_name))
  File "/home/tangled/miniconda2/envs/phyluce/lib/python2.7/site-packages/illumiprocessor/core.py", line 86, in __init__
    self._get_read_data()
  File "/home/tangled/miniconda2/envs/phyluce/lib/python2.7/site-packages/illumiprocessor/core.py", line 104, in _get_read_data
    "errors in your conf file.".format(self.start_name))
IOError: There is a problem with the read names for HS001. Ensure you do not have spelling/capitalization errors in your conf file.

My files are named as such: HS001_R1_001.fastq.gz HS001_R2_001.fastq.gz etc.

The .conf file:

[adapters]
i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT*GTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences]
i5-001_A:GTCAACAG
i5-001_B:GTCAACAG
i7-110_01:GCACAACT
i7-110_02:TTCTCTCG
...

[tag map]
HS001:i5-001_A,i7-110_01
HS002:i5-001_B,i7-110_02
...

[names]
HS001:HS001
HS002:HS002
...

I've tried modifying the "{}(?:.*)(R1|READ1|Read1|read1)_\d+.fastq(?:.gz)*" in #90 to the following based on the other posts (#90 , #96 , and #74 ).

illumiprocessor \
    --input raw-fastq/ \
    --r1-pattern _R1_001 \
    --r2-pattern _R2_001 \
    --output clean-fastq \
    --config illumiprocessor_HS.conf \
    --cores 12 \
    --log-path log

--r1-pattern _R1
--r2-pattern _R2
--r1-pattern _R1_001
--r2-pattern _R2_001
--r1-pattern "{}_R1_\d+.fastq(?:.gz)*"
--r2-pattern "{}_R2_\d+.fastq(?:.gz)*"
--r1-pattern "{}R1\d+.fastq(?:.gz)*"
--r2-pattern "{}R1\d+.fastq(?:.gz)*"
And also tried changing my file names to 'HS001_S1_L001_R1_001.fastq.gz' and 'HS001_S1_L001_R2_001.fastq.gz' to try and mimic a previous analysis that did work under the default settings.

Probably a simple fix, but I cant get it to work.

Thanks in advance, Max

brantfaircloth commented 4 years ago

It's hard to say without knowing what ALL of your files look like, but if your files are named like:

HS001_R1_001.fastq.gz
HS001_R2_001.fastq.gz

You definitely are going to need something like :

illumiprocessor \
    --input raw-fastq/ \
    --output clean-fastq \
    --config illumiprocessor_HS.conf \
    --cores 12 \
    --r1-pattern "{}_R1_\d+.fastq.gz" \
    --r2-pattern "{}_R2_\d+.fastq.gz" \
    --log-path log
MaxTBC commented 4 years ago

Thanks for the quick reply Brant.

The code you set did not work resulting in the same error.

Here is the working directory:

clean-fastq
illumiprocessor_HS.conf
log
raw-data

Here are all the file names inside the 'raw-fastq' folder:

HS001_R1_001.fastq.gz  HS024_R1_001.fastq.gz  HS054_R1_001.fastq.gz
HS001_R2_001.fastq.gz  HS024_R2_001.fastq.gz  HS054_R2_001.fastq.gz
HS002_R1_001.fastq.gz  HS025_R1_001.fastq.gz  HS055_R1_001.fastq.gz
HS002_R2_001.fastq.gz  HS025_R2_001.fastq.gz  HS055_R2_001.fastq.gz
HS003_R1_001.fastq.gz  HS026_R1_001.fastq.gz  HS056_R1_001.fastq.gz
HS003_R2_001.fastq.gz  HS026_R2_001.fastq.gz  HS056_R2_001.fastq.gz
HS004_R1_001.fastq.gz  HS027_R1_001.fastq.gz  HS058_R1_001.fastq.gz
HS004_R2_001.fastq.gz  HS027_R2_001.fastq.gz  HS058_R2_001.fastq.gz
HS006_R1_001.fastq.gz  HS028_R1_001.fastq.gz  HS059_R1_001.fastq.gz
HS006_R2_001.fastq.gz  HS028_R2_001.fastq.gz  HS059_R2_001.fastq.gz
HS007_R1_001.fastq.gz  HS029_R1_001.fastq.gz  HS060_R1_001.fastq.gz
HS007_R2_001.fastq.gz  HS029_R2_001.fastq.gz  HS060_R2_001.fastq.gz
HS008_R1_001.fastq.gz  HS030_R1_001.fastq.gz  HS061_R1_001.fastq.gz
HS008_R2_001.fastq.gz  HS030_R2_001.fastq.gz  HS061_R2_001.fastq.gz
HS009_R1_001.fastq.gz  HS031_R1_001.fastq.gz  HS075_R1_001.fastq.gz
HS009_R2_001.fastq.gz  HS031_R2_001.fastq.gz  HS075_R2_001.fastq.gz
HS010_R1_001.fastq.gz  HS032_R1_001.fastq.gz  HS076_R1_001.fastq.gz
HS010_R2_001.fastq.gz  HS032_R2_001.fastq.gz  HS076_R2_001.fastq.gz
HS011_R1_001.fastq.gz  HS033_R1_001.fastq.gz  HS077_R1_001.fastq.gz
HS011_R2_001.fastq.gz  HS033_R2_001.fastq.gz  HS077_R2_001.fastq.gz
HS012_R1_001.fastq.gz  HS034_R1_001.fastq.gz  HS078_R1_001.fastq.gz
HS012_R2_001.fastq.gz  HS034_R2_001.fastq.gz  HS078_R2_001.fastq.gz
HS013_R1_001.fastq.gz  HS035_R1_001.fastq.gz  HS080_R1_001.fastq.gz
HS013_R2_001.fastq.gz  HS035_R2_001.fastq.gz  HS080_R2_001.fastq.gz
HS014_R1_001.fastq.gz  HS036_R1_001.fastq.gz  HS081_R1_001.fastq.gz
HS014_R2_001.fastq.gz  HS036_R2_001.fastq.gz  HS081_R2_001.fastq.gz
HS015_R1_001.fastq.gz  HS047_R1_001.fastq.gz  HS090_R1_001.fastq.gz
HS015_R2_001.fastq.gz  HS047_R2_001.fastq.gz  HS090_R2_001.fastq.gz
HS017_R1_001.fastq.gz  HS048_R1_001.fastq.gz  HS093_R1_001.fastq.gz
HS017_R2_001.fastq.gz  HS048_R2_001.fastq.gz  HS093_R2_001.fastq.gz
HS018_R1_001.fastq.gz  HS049_R1_001.fastq.gz  HS097_R1_001.fastq.gz
HS018_R2_001.fastq.gz  HS049_R2_001.fastq.gz  HS097_R2_001.fastq.gz
HS019_R1_001.fastq.gz  HS050_R1_001.fastq.gz  HS100_R1_001.fastq.gz
HS019_R2_001.fastq.gz  HS050_R2_001.fastq.gz  HS100_R2_001.fastq.gz
HS020_R1_001.fastq.gz  HS051_R1_001.fastq.gz  HS101_R1_001.fastq.gz
HS020_R2_001.fastq.gz  HS051_R2_001.fastq.gz  HS101_R2_001.fastq.gz
HS022_R1_001.fastq.gz  HS052_R1_001.fastq.gz  HS102_R1_001.fastq.gz
HS022_R2_001.fastq.gz  HS052_R2_001.fastq.gz  HS102_R2_001.fastq.gz
HS023_R1_001.fastq.gz  HS053_R1_001.fastq.gz  HS103_R1_001.fastq.gz
HS023_R2_001.fastq.gz  HS053_R2_001.fastq.gz  HS103_R2_001.fastq.gz

And for any other issues, here is the full .conf file:

[adapters]
i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT*GTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences]
i5-001_A:GTCAACAG
i5-001_B:GTCAACAG
i5-001_C:GTCAACAG
i5-001_D:GTCAACAG
i5-001_E:GTCAACAG
i5-001_F:GTCAACAG
i5-001_G:GTCAACAG
i5-001_H:GTCAACAG
i5-002_A:GTCAACAG
i5-002_B:GTCAACAG
i5-002_C:GTCAACAG
i5-002_D:GTCAACAG
i5-002_E:CCAGTATC
i5-002_F:CCAGTATC
i5-002_G:CCAGTATC
i5-002_H:CCAGTATC
i5-003_A:CCAGTATC
i5-003_B:CCAGTATC
i5-003_C:CCAGTATC
i5-003_D:CCAGTATC
i5-003_E:CCAGTATC
i5-003_F:CCAGTATC
i5-003_G:CCAGTATC
i5-003_H:CCAGTATC
i5-004_A:TCAGTAGG
i5-004_B:TCAGTAGG
i5-004_C:TCAGTAGG
i5-004_D:TCAGTAGG
i5-004_E:TCAGTAGG
i5-004_F:TCAGTAGG
i5-004_G:TCAGTAGG
i5-004_H:TCAGTAGG
i5-005_A:TCAGTAGG
i5-005_B:TCAGTAGG
i5-005_C:TCAGTAGG
i5-005_D:TCAGTAGG
i5-005_E:TTGCAACG
i5-005_F:TTGCAACG
i5-005_G:TTGCAACG
i5-005_H:TTGCAACG
i5-006_A:TTGCAACG
i5-006_B:TTGCAACG
i5-006_C:TTGCAACG
i5-006_D:TTGCAACG
i5-006_E:TTGCAACG
i5-006_F:TTGCAACG
i5-006_G:TTGCAACG
i5-006_H:TTGCAACG
i5-007_A:AGTCAGGT
i5-007_B:AGTCAGGT
i5-007_C:AGTCAGGT
i5-007_D:AGTCAGGT
i5-007_E:AGTCAGGT
i5-007_F:AGTCAGGT
i5-007_G:AGTCAGGT
i5-007_H:AGTCAGGT
i5-008_A:AGTCAGGT
i5-008_B:AGTCAGGT
i5-008_C:AGTCAGGT
i5-008_D:AGTCAGGT
i7-110_01:GCACAACT
i7-110_02:TTCTCTCG
i7-110_03:AACGGTCA
i7-110_04:ACAGACCT
i7-110_05:TCTCTTCC
i7-110_06:AGTGTTGG
i7-110_07:TGGCATGT
i7-110_08:AGAAGCGT
i7-110_09:AGCGGAAT
i7-110_10:TAACCGGT
i7-110_11:CATGGAAC
i7-110_12:ATGGTCCA
i7-111_01:CTTCTGAG
i7-111_02:AACCGAAG
i7-111_03:TTCGTACC
i7-111_04:CTGTTAGG
i7-111_05:CACAAGTC
i7-111_06:TCTTGACG
i7-111_07:CGTCTTGT
i7-111_08:CGTGATCA
i7-111_09:CCAAGTTG
i7-111_10:GTACCTTG
i7-111_11:GACTATGC
i7-111_12:TGGATCAC
i7-112_01:CTCTGGTT
i7-112_02:GTTCATGG
i7-112_03:GCTGTAAG
i7-112_04:GTCGAAGA
i7-112_05:GAGCTCAA
i7-112_06:TGAACCTG
i7-112_07:CCGACTAT
i7-112_08:AGCTAACC
i7-112_09:GCCTTGTT
i7-112_10:AACTTGCC
i7-112_11:CAATGTGG
i7-112_12:AAGGCTGA
i7-113_01:TTACCGAG
i7-113_02:GTCCTAAG
i7-113_03:GAAGGTTC
i7-113_04:GAAGAGGT
i7-113_05:TCTGAGAG
i7-113_06:ACCGCATA
i7-113_07:GAAGTACC
i7-113_08:CAGGTATC
i7-113_09:TCTCTAGG
i7-113_10:AAGCACTG
i7-113_11:CCAAGCAA
i7-113_12:TGTTCGAG
i7-114_01:GCACAACT
i7-114_02:TTCTCTCG
i7-114_03:AACGGTCA
i7-114_04:ACAGACCT
i7-114_05:TCTCTTCC
i7-114_06:AGTGTTGG
i7-114_07:TGGCATGT
i7-114_08:AGAAGCGT
i7-114_09:AGCGGAAT
i7-114_10:TAACCGGT
i7-114_11:CATGGAAC
i7-114_12:ATGGTCCA

[tag map]
HS001:i5-001_A,i7-110_01
HS002:i5-001_B,i7-110_02
HS003:i5-001_C,i7-110_03
HS004:i5-001_D,i7-110_04
HS006:i5-001_E,i7-110_05
HS007:i5-001_F,i7-110_06
HS008:i5-001_G,i7-110_07
HS009:i5-001_H,i7-110_08
HS010:i5-002_A,i7-110_09
HS011:i5-002_B,i7-110_10
HS012:i5-002_C,i7-110_11
HS013:i5-002_D,i7-110_12
HS014:i5-002_E,i7-111_01
HS015:i5-002_F,i7-111_02
HS017:i5-002_G,i7-111_03
HS018:i5-002_H,i7-111_04
HS019:i5-003_A,i7-111_05
HS020:i5-003_B,i7-111_06
HS022:i5-003_C,i7-111_07
HS023:i5-003_D,i7-111_08
HS024:i5-003_E,i7-111_09
HS025:i5-003_F,i7-111_10
HS026:i5-003_G,i7-111_11
HS027:i5-003_H,i7-111_12
HS028:i5-004_A,i7-112_01
HS029:i5-004_B,i7-112_02
HS030:i5-004_C,i7-112_03
HS031:i5-004_D,i7-112_04
HS032:i5-004_E,i7-112_05
HS033:i5-004_F,i7-112_06
HS034:i5-004_G,i7-112_07
HS035:i5-004_H,i7-112_08
HS036:i5-005_A,i7-112_09
HS047:i5-005_B,i7-112_10
HS048:i5-005_C,i7-112_11
HS049:i5-005_D,i7-112_12
HS050:i5-005_E,i7-113_01
HS051:i5-005_F,i7-113_02
HS052:i5-005_G,i7-113_03
HS053:i5-005_H,i7-113_04
HS054:i5-006_A,i7-113_05
HS055:i5-006_B,i7-113_06
HS056:i5-006_C,i7-113_07
HS058:i5-006_D,i7-113_08
HS059:i5-006_E,i7-113_09
HS060:i5-006_F,i7-113_10
HS061:i5-006_G,i7-113_11
HS075:i5-006_H,i7-113_12
HS076:i5-007_A,i7-114_01
HS077:i5-007_B,i7-114_02
HS078:i5-007_C,i7-114_03
HS080:i5-007_D,i7-114_04
HS081:i5-007_E,i7-114_05
HS090:i5-007_F,i7-114_06
HS093:i5-007_G,i7-114_07
HS097:i5-007_H,i7-114_08
HS100:i5-008_A,i7-114_09
HS101:i5-008_B,i7-114_10
HS102:i5-008_C,i7-114_11
HS103:i5-008_D,i7-114_12

[names]
HS001:HS001
HS002:HS002
HS003:HS003
HS004:HS004
HS006:HS006
HS007:HS007
HS008:HS008
HS009:HS009
HS010:HS010
HS011:HS011
HS012:HS012
HS013:HS013
HS014:HS014
HS015:HS015
HS017:HS017
HS018:HS018
HS019:HS019
HS020:HS020
HS022:HS022
HS023:HS023
HS024:HS024
HS025:HS025
HS026:HS026
HS027:HS027
HS028:HS028
HS029:HS029
HS030:HS030
HS031:HS031
HS032:HS032
HS033:HS033
HS034:HS034
HS035:HS035
HS036:HS036
HS047:HS047
HS048:HS048
HS049:HS049
HS050:HS050
HS051:HS051
HS052:HS052
HS053:HS053
HS054:HS054
HS055:HS055
HS056:HS056
HS058:HS058
HS059:HS059
HS060:HS060
HS061:HS061
HS075:HS075
HS076:HS076
HS077:HS077
HS078:HS078
HS080:HS080
HS081:HS081
HS090:HS090
HS093:HS093
HS097:HS097
HS100:HS100
HS101:HS101
HS102:HS102
HS103:HS103
brantfaircloth commented 4 years ago

If I reduce your config file to:

[adapters]
i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT*GTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences]
i5-001_A:GTCAACAG
i5-001_B:GTCAACAG
i7-110_01:GCACAACT
i7-110_02:TTCTCTCG

[tag map]
HS001:i5-001_A,i7-110_01
HS002:i5-001_B,i7-110_02

[names]
HS001:HS001
HS002:HS002

And run illumiprocessor against 4 files in raw-fastq named (not your files - just some test data):

(phyluce) ~/T/illumiprocessor-test > ll raw-fastq
total 1792
-rw-r--r--  1 bcf  staff   221K Nov 19 13:49 HS001_R1_001.fastq.gz
-rw-r--r--  1 bcf  staff   220K Nov 19 13:49 HS001_R2_001.fastq.gz
-rw-r--r--  1 bcf  staff   221K Nov 19 13:50 HS002_R1_001.fastq.gz
-rw-r--r--  1 bcf  staff   220K Nov 19 13:50 HS002_R2_001.fastq.gz

With

illumiprocessor \
    --input raw-fastq \
    --output clean-fastq \
    --config test.conf \
    --cores 1 \
    --r1-pattern "{}_R1_\d+.fastq.gz" \
    --r2-pattern "{}_R2_\d+.fastq.gz" \
    --log-path log

The code runs as expected with no errors. Maybe split things up into smaller batches and see if you can localize the problem by batch - I checked obvious stuff, but can't immediately find what might be wrong.

MaxTBC commented 4 years ago

Hmmm... I did exactly what you have above but still erroring that way: Same result when trying it with just the last two samples HS102 and HS103.

Is it a naming error specifically, or could it also be an issue with the .fastq.gz files themselves?

(phyluce) tangled@tbc-comp1:~/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data$ illumiprocessor \
>     --input raw-fastq/1st_2 \
>     --output clean-fastq \
>     --config illumiprocessor_HS_1st2.conf \
>     --cores 12 \
>     --r1-pattern "{}_R1_\d+.fastq.gz" \
>     --r2-pattern "{}_R2_\d+.fastq.gz" \
>     --log-path log
[WARNING] Output directory exists, REMOVE [Y/n]? Y
2020-11-19 15:19:06,892 - illumiprocessor - INFO - ==================== Starting illumiprocessor ===================
2020-11-19 15:19:06,892 - illumiprocessor - INFO - Version: 2.0.9
2020-11-19 15:19:06,893 - illumiprocessor - INFO - Argument --config: illumiprocessor_HS_1st2.conf
2020-11-19 15:19:06,893 - illumiprocessor - INFO - Argument --cores: 12
2020-11-19 15:19:06,893 - illumiprocessor - INFO - Argument --input: /home/tangled/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data/raw-fastq/1st_2
2020-11-19 15:19:06,893 - illumiprocessor - INFO - Argument --log_path: /home/tangled/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data/log
2020-11-19 15:19:06,894 - illumiprocessor - INFO - Argument --min_len: 40
2020-11-19 15:19:06,894 - illumiprocessor - INFO - Argument --no_merge: False
2020-11-19 15:19:06,894 - illumiprocessor - INFO - Argument --output: /home/tangled/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data/clean-fastq
2020-11-19 15:19:06,894 - illumiprocessor - INFO - Argument --phred: phred33
2020-11-19 15:19:06,895 - illumiprocessor - INFO - Argument --r1_pattern: {}_R1_\d+.fastq.gz
2020-11-19 15:19:06,895 - illumiprocessor - INFO - Argument --r2_pattern: {}_R2_\d+.fastq.gz
2020-11-19 15:19:06,895 - illumiprocessor - INFO - Argument --se: False
2020-11-19 15:19:06,896 - illumiprocessor - INFO - Argument --trimmomatic: /home/tangled/miniconda2/envs/phyluce/bin/trimmomatic
2020-11-19 15:19:06,896 - illumiprocessor - INFO - Argument --verbosity: INFO
Traceback (most recent call last):
  File "/home/tangled/miniconda2/envs/phyluce/bin/illumiprocessor", line 17, in <module>
    sys.exit(main())
  File "/home/tangled/miniconda2/envs/phyluce/lib/python2.7/site-packages/illumiprocessor/cli/main.py", line 121, in main
    main(args)
  File "/home/tangled/miniconda2/envs/phyluce/lib/python2.7/site-packages/illumiprocessor/main.py", line 34, in main
    reads.append(core.SequenceData(args, conf, start_name, end_name))
  File "/home/tangled/miniconda2/envs/phyluce/lib/python2.7/site-packages/illumiprocessor/core.py", line 86, in __init__
    self._get_read_data()
  File "/home/tangled/miniconda2/envs/phyluce/lib/python2.7/site-packages/illumiprocessor/core.py", line 104, in _get_read_data
    "errors in your conf file.".format(self.start_name))
IOError: There is a problem with the read names for HS001. Ensure you do not have spelling/capitalization errors in your conf file.
brantfaircloth commented 4 years ago

It seems to be having problems finding one of your two read files (I can't tell why). Basically, this error happens when it cannot find R1 and/or R2. Then it fires of the error message you see. Maybe check that all your filenames are absolutely correct between your config file and what's in the 1st_2 directory and that there are definitely both R1 and R2 files for the HS001 sample in the directory 1st_2.

MaxTBC commented 4 years ago

Thats what I was thinking, it not being able to 'find' the files, but I've double checked.

illumiprocessor \
    --input raw-fastq/1st_2 \
    --output clean-fastq \
    --config illumiprocessor_HS_1st2.conf \
    --cores 12 \
    --r1-pattern "{}_R1_\d+.fastq.gz" \
    --r2-pattern "{}_R2_\d+.fastq.gz" \
    --log-path log
(phyluce) tangled@tbc-comp1:~/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data$ head -n20 illumiprocessor_HS_1st2.conf 
[adapters]
i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT*GTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences]
i5-001_A:GTCAACAG
i5-001_B:GTCAACAG
i7-110_01:GCACAACT
i7-110_02:TTCTCTCG

[tag map]
HS001:i5-001_A,i7-110_01
HS002:i5-001_B,i7-110_02

[names]
HS001:HS001
HS002:HS002
(phyluce) tangled@tbc-comp1:~/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data/raw-data/1st_2$ ls
HS001_R1_001.fastq.gz  HS002_R1_001.fastq.gz
HS001_R2_001.fastq.gz  HS002_R2_001.fastq.gz
MAUlyssea commented 1 year ago

MaxTBC, I'm having the same problem. How did you solve it? I appreciate any comments on that.

Thats what I was thinking, it not being able to 'find' the files, but I've double checked.

illumiprocessor \
    --input raw-fastq/1st_2 \
    --output clean-fastq \
    --config illumiprocessor_HS_1st2.conf \
    --cores 12 \
    --r1-pattern "{}_R1_\d+.fastq.gz" \
    --r2-pattern "{}_R2_\d+.fastq.gz" \
    --log-path log
(phyluce) tangled@tbc-comp1:~/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data$ head -n20 illumiprocessor_HS_1st2.conf 
[adapters]
i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT*GTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences]
i5-001_A:GTCAACAG
i5-001_B:GTCAACAG
i7-110_01:GCACAACT
i7-110_02:TTCTCTCG

[tag map]
HS001:i5-001_A,i7-110_01
HS002:i5-001_B,i7-110_02

[names]
HS001:HS001
HS002:HS002
(phyluce) tangled@tbc-comp1:~/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data/raw-data/1st_2$ ls
HS001_R1_001.fastq.gz  HS002_R1_001.fastq.gz
HS001_R2_001.fastq.gz  HS002_R2_001.fastq.gz
brantfaircloth commented 1 year ago

@MAUlyssea What does the config file that you are trying to use look like and what are the file names you are trying to process?

MAUlyssea commented 1 year ago

Hi! I'm trying to run illumiprocessor just with few fastq files first. My config file is like:

[adapters] i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCTCGTATGCCGTCTTCTGCTTG i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences] i7-iTru7_210_08:GCTGTAAG i7-iTru7_210_09:CGAATACG i7-iTru7_210_10:GCCAGAAT i7-iTru7_210_11:TCCTGGTA i5-iTru5_106_E:ACTCCTAC i5-iTru5_106_F:GTCTGAGT i5-iTru5_106_G:ACTGCACT i5-iTru5_106_H:GGAATGTC

[tag map] 00101:i7-iTru7_210_08,i5-iTru5_106_E 00201:i7-iTru7_210_09,i5-iTru5_106_F 00301:i7-iTru7_210_10,i5-iTru5_106_G 00401:i7-iTru7_210_11,i5-iTru5_106_H

[names] 00101:Hylomyrma_balzani_00101 00201:Hylomyrma_balzani_00201 00301:Hylomyrma_balzani_00301 00401:Hylomyrma_balzani_00401

and the raw-fastq files I'm trying to process are these: 001_01_R1_001.fastq.gz 001_01_R2_001.fastq.gz 002_01_R1_001.fastq.gz 002_01_R2_001.fastq.gz 003_01_R1_001.fastq.gz 003_01_R2_001.fastq.gz 004_01_R1_001.fastq.gz 004_01_R2_001.fastq.gz

brantfaircloth commented 1 year ago

You will need to adjust the regular expression to match your read names. That will look something like:

illumiprocessor \
    --input <path-to-files> \
    --output clean-fastq \
    --config <config file>\
    --cores 12 \
    --r1-pattern "{}R1_\d+.fastq.gz" \
    --r2-pattern "{}R2_\d+.fastq.gz" \
    --log-path log
MAUlyssea commented 1 year ago

For specifying the r1 & r2 patterns I've tried: --r1-pattern "{}_R1\d+.fastq.gz" \ --r2-pattern "{}_R2\d+.fastq.gz" \

--r1-pattern "{}_R1_\d+.fastq.gz" \
--r2-pattern "{}_R2_\d+.fastq.gz" \ 

--r1-pattern "{}_R1_001\d+.fastq.gz" \
--r2-pattern "{}_R2_001\d+.fastq.gz" \ 

and all failed. I just tried the way you said and it worked!! But I didnt put the log specification. I ran: illumiprocessor \ --input \ --output clean-fastq \ --config \ --cores 12 \ --r1-pattern "{}R1\d+.fastq.gz" \ --r2-pattern "{}R2\d+.fastq.gz"

Thanks!!

brantfaircloth commented 1 year ago

You needed to structure it like I suggested because you included the second _ in your sample names (e.g. 001_01_), and the sample name is what gets substituted into the squiggly brackets ({}) in the pattern option. Glad you got it working.

NYX-PLUTO commented 1 year ago

Hello,

I have been dealing with a naming error that is similar to those above. The suggested solutions do not resolve this issue.

I have 40*2 = 80 total fastq.gz files located in the directory "working" that follow this structure S703_L003_R1_001.fastq.gz S703_L003_R2_001.fastq.gz

My configuration file is structured as: [adapters] i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG i5:AATGATACGGCGACCACCGAGATCTACAC*ACACTCTTTCCCTACACGACGCTCTTCCGATCT

[tag sequences] i7-128:TTCGAAGC i5-534:CGACGTTA

[tag map] S703:i7-128,i5-534

[names] S703:BME101020_Atorridus_KernCo_Caliente

My .sh file: illumiprocessor \ --input working \ --output clean-fastq \ --config illumiprocessor_rev.conf \ --cores 20 \ --r1-pattern "{}R1_\d+.fastq.gz" \ --r2-pattern "{}R2_\d+.fastq.gz"

(I have tried with {}_R1_\d+.fastq.gz and without the r1/r2 pattern flags as well)

The exact error I get: File "/home/hays/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 106, in _get_read_data "errors in your conf file.".format(self.start_name) OSError: There is a problem with the read names for S703. Ensure you do not have spelling/capitalization errors in your conf file.

Thank you for your help.