Closed MaxTBC closed 3 years ago
It's hard to say without knowing what ALL of your files look like, but if your files are named like:
HS001_R1_001.fastq.gz
HS001_R2_001.fastq.gz
You definitely are going to need something like :
illumiprocessor \
--input raw-fastq/ \
--output clean-fastq \
--config illumiprocessor_HS.conf \
--cores 12 \
--r1-pattern "{}_R1_\d+.fastq.gz" \
--r2-pattern "{}_R2_\d+.fastq.gz" \
--log-path log
Thanks for the quick reply Brant.
The code you set did not work resulting in the same error.
Here is the working directory:
clean-fastq
illumiprocessor_HS.conf
log
raw-data
Here are all the file names inside the 'raw-fastq' folder:
HS001_R1_001.fastq.gz HS024_R1_001.fastq.gz HS054_R1_001.fastq.gz
HS001_R2_001.fastq.gz HS024_R2_001.fastq.gz HS054_R2_001.fastq.gz
HS002_R1_001.fastq.gz HS025_R1_001.fastq.gz HS055_R1_001.fastq.gz
HS002_R2_001.fastq.gz HS025_R2_001.fastq.gz HS055_R2_001.fastq.gz
HS003_R1_001.fastq.gz HS026_R1_001.fastq.gz HS056_R1_001.fastq.gz
HS003_R2_001.fastq.gz HS026_R2_001.fastq.gz HS056_R2_001.fastq.gz
HS004_R1_001.fastq.gz HS027_R1_001.fastq.gz HS058_R1_001.fastq.gz
HS004_R2_001.fastq.gz HS027_R2_001.fastq.gz HS058_R2_001.fastq.gz
HS006_R1_001.fastq.gz HS028_R1_001.fastq.gz HS059_R1_001.fastq.gz
HS006_R2_001.fastq.gz HS028_R2_001.fastq.gz HS059_R2_001.fastq.gz
HS007_R1_001.fastq.gz HS029_R1_001.fastq.gz HS060_R1_001.fastq.gz
HS007_R2_001.fastq.gz HS029_R2_001.fastq.gz HS060_R2_001.fastq.gz
HS008_R1_001.fastq.gz HS030_R1_001.fastq.gz HS061_R1_001.fastq.gz
HS008_R2_001.fastq.gz HS030_R2_001.fastq.gz HS061_R2_001.fastq.gz
HS009_R1_001.fastq.gz HS031_R1_001.fastq.gz HS075_R1_001.fastq.gz
HS009_R2_001.fastq.gz HS031_R2_001.fastq.gz HS075_R2_001.fastq.gz
HS010_R1_001.fastq.gz HS032_R1_001.fastq.gz HS076_R1_001.fastq.gz
HS010_R2_001.fastq.gz HS032_R2_001.fastq.gz HS076_R2_001.fastq.gz
HS011_R1_001.fastq.gz HS033_R1_001.fastq.gz HS077_R1_001.fastq.gz
HS011_R2_001.fastq.gz HS033_R2_001.fastq.gz HS077_R2_001.fastq.gz
HS012_R1_001.fastq.gz HS034_R1_001.fastq.gz HS078_R1_001.fastq.gz
HS012_R2_001.fastq.gz HS034_R2_001.fastq.gz HS078_R2_001.fastq.gz
HS013_R1_001.fastq.gz HS035_R1_001.fastq.gz HS080_R1_001.fastq.gz
HS013_R2_001.fastq.gz HS035_R2_001.fastq.gz HS080_R2_001.fastq.gz
HS014_R1_001.fastq.gz HS036_R1_001.fastq.gz HS081_R1_001.fastq.gz
HS014_R2_001.fastq.gz HS036_R2_001.fastq.gz HS081_R2_001.fastq.gz
HS015_R1_001.fastq.gz HS047_R1_001.fastq.gz HS090_R1_001.fastq.gz
HS015_R2_001.fastq.gz HS047_R2_001.fastq.gz HS090_R2_001.fastq.gz
HS017_R1_001.fastq.gz HS048_R1_001.fastq.gz HS093_R1_001.fastq.gz
HS017_R2_001.fastq.gz HS048_R2_001.fastq.gz HS093_R2_001.fastq.gz
HS018_R1_001.fastq.gz HS049_R1_001.fastq.gz HS097_R1_001.fastq.gz
HS018_R2_001.fastq.gz HS049_R2_001.fastq.gz HS097_R2_001.fastq.gz
HS019_R1_001.fastq.gz HS050_R1_001.fastq.gz HS100_R1_001.fastq.gz
HS019_R2_001.fastq.gz HS050_R2_001.fastq.gz HS100_R2_001.fastq.gz
HS020_R1_001.fastq.gz HS051_R1_001.fastq.gz HS101_R1_001.fastq.gz
HS020_R2_001.fastq.gz HS051_R2_001.fastq.gz HS101_R2_001.fastq.gz
HS022_R1_001.fastq.gz HS052_R1_001.fastq.gz HS102_R1_001.fastq.gz
HS022_R2_001.fastq.gz HS052_R2_001.fastq.gz HS102_R2_001.fastq.gz
HS023_R1_001.fastq.gz HS053_R1_001.fastq.gz HS103_R1_001.fastq.gz
HS023_R2_001.fastq.gz HS053_R2_001.fastq.gz HS103_R2_001.fastq.gz
And for any other issues, here is the full .conf file:
[adapters]
i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT*GTGTAGATCTCGGTGGTCGCCGTATCATT
[tag sequences]
i5-001_A:GTCAACAG
i5-001_B:GTCAACAG
i5-001_C:GTCAACAG
i5-001_D:GTCAACAG
i5-001_E:GTCAACAG
i5-001_F:GTCAACAG
i5-001_G:GTCAACAG
i5-001_H:GTCAACAG
i5-002_A:GTCAACAG
i5-002_B:GTCAACAG
i5-002_C:GTCAACAG
i5-002_D:GTCAACAG
i5-002_E:CCAGTATC
i5-002_F:CCAGTATC
i5-002_G:CCAGTATC
i5-002_H:CCAGTATC
i5-003_A:CCAGTATC
i5-003_B:CCAGTATC
i5-003_C:CCAGTATC
i5-003_D:CCAGTATC
i5-003_E:CCAGTATC
i5-003_F:CCAGTATC
i5-003_G:CCAGTATC
i5-003_H:CCAGTATC
i5-004_A:TCAGTAGG
i5-004_B:TCAGTAGG
i5-004_C:TCAGTAGG
i5-004_D:TCAGTAGG
i5-004_E:TCAGTAGG
i5-004_F:TCAGTAGG
i5-004_G:TCAGTAGG
i5-004_H:TCAGTAGG
i5-005_A:TCAGTAGG
i5-005_B:TCAGTAGG
i5-005_C:TCAGTAGG
i5-005_D:TCAGTAGG
i5-005_E:TTGCAACG
i5-005_F:TTGCAACG
i5-005_G:TTGCAACG
i5-005_H:TTGCAACG
i5-006_A:TTGCAACG
i5-006_B:TTGCAACG
i5-006_C:TTGCAACG
i5-006_D:TTGCAACG
i5-006_E:TTGCAACG
i5-006_F:TTGCAACG
i5-006_G:TTGCAACG
i5-006_H:TTGCAACG
i5-007_A:AGTCAGGT
i5-007_B:AGTCAGGT
i5-007_C:AGTCAGGT
i5-007_D:AGTCAGGT
i5-007_E:AGTCAGGT
i5-007_F:AGTCAGGT
i5-007_G:AGTCAGGT
i5-007_H:AGTCAGGT
i5-008_A:AGTCAGGT
i5-008_B:AGTCAGGT
i5-008_C:AGTCAGGT
i5-008_D:AGTCAGGT
i7-110_01:GCACAACT
i7-110_02:TTCTCTCG
i7-110_03:AACGGTCA
i7-110_04:ACAGACCT
i7-110_05:TCTCTTCC
i7-110_06:AGTGTTGG
i7-110_07:TGGCATGT
i7-110_08:AGAAGCGT
i7-110_09:AGCGGAAT
i7-110_10:TAACCGGT
i7-110_11:CATGGAAC
i7-110_12:ATGGTCCA
i7-111_01:CTTCTGAG
i7-111_02:AACCGAAG
i7-111_03:TTCGTACC
i7-111_04:CTGTTAGG
i7-111_05:CACAAGTC
i7-111_06:TCTTGACG
i7-111_07:CGTCTTGT
i7-111_08:CGTGATCA
i7-111_09:CCAAGTTG
i7-111_10:GTACCTTG
i7-111_11:GACTATGC
i7-111_12:TGGATCAC
i7-112_01:CTCTGGTT
i7-112_02:GTTCATGG
i7-112_03:GCTGTAAG
i7-112_04:GTCGAAGA
i7-112_05:GAGCTCAA
i7-112_06:TGAACCTG
i7-112_07:CCGACTAT
i7-112_08:AGCTAACC
i7-112_09:GCCTTGTT
i7-112_10:AACTTGCC
i7-112_11:CAATGTGG
i7-112_12:AAGGCTGA
i7-113_01:TTACCGAG
i7-113_02:GTCCTAAG
i7-113_03:GAAGGTTC
i7-113_04:GAAGAGGT
i7-113_05:TCTGAGAG
i7-113_06:ACCGCATA
i7-113_07:GAAGTACC
i7-113_08:CAGGTATC
i7-113_09:TCTCTAGG
i7-113_10:AAGCACTG
i7-113_11:CCAAGCAA
i7-113_12:TGTTCGAG
i7-114_01:GCACAACT
i7-114_02:TTCTCTCG
i7-114_03:AACGGTCA
i7-114_04:ACAGACCT
i7-114_05:TCTCTTCC
i7-114_06:AGTGTTGG
i7-114_07:TGGCATGT
i7-114_08:AGAAGCGT
i7-114_09:AGCGGAAT
i7-114_10:TAACCGGT
i7-114_11:CATGGAAC
i7-114_12:ATGGTCCA
[tag map]
HS001:i5-001_A,i7-110_01
HS002:i5-001_B,i7-110_02
HS003:i5-001_C,i7-110_03
HS004:i5-001_D,i7-110_04
HS006:i5-001_E,i7-110_05
HS007:i5-001_F,i7-110_06
HS008:i5-001_G,i7-110_07
HS009:i5-001_H,i7-110_08
HS010:i5-002_A,i7-110_09
HS011:i5-002_B,i7-110_10
HS012:i5-002_C,i7-110_11
HS013:i5-002_D,i7-110_12
HS014:i5-002_E,i7-111_01
HS015:i5-002_F,i7-111_02
HS017:i5-002_G,i7-111_03
HS018:i5-002_H,i7-111_04
HS019:i5-003_A,i7-111_05
HS020:i5-003_B,i7-111_06
HS022:i5-003_C,i7-111_07
HS023:i5-003_D,i7-111_08
HS024:i5-003_E,i7-111_09
HS025:i5-003_F,i7-111_10
HS026:i5-003_G,i7-111_11
HS027:i5-003_H,i7-111_12
HS028:i5-004_A,i7-112_01
HS029:i5-004_B,i7-112_02
HS030:i5-004_C,i7-112_03
HS031:i5-004_D,i7-112_04
HS032:i5-004_E,i7-112_05
HS033:i5-004_F,i7-112_06
HS034:i5-004_G,i7-112_07
HS035:i5-004_H,i7-112_08
HS036:i5-005_A,i7-112_09
HS047:i5-005_B,i7-112_10
HS048:i5-005_C,i7-112_11
HS049:i5-005_D,i7-112_12
HS050:i5-005_E,i7-113_01
HS051:i5-005_F,i7-113_02
HS052:i5-005_G,i7-113_03
HS053:i5-005_H,i7-113_04
HS054:i5-006_A,i7-113_05
HS055:i5-006_B,i7-113_06
HS056:i5-006_C,i7-113_07
HS058:i5-006_D,i7-113_08
HS059:i5-006_E,i7-113_09
HS060:i5-006_F,i7-113_10
HS061:i5-006_G,i7-113_11
HS075:i5-006_H,i7-113_12
HS076:i5-007_A,i7-114_01
HS077:i5-007_B,i7-114_02
HS078:i5-007_C,i7-114_03
HS080:i5-007_D,i7-114_04
HS081:i5-007_E,i7-114_05
HS090:i5-007_F,i7-114_06
HS093:i5-007_G,i7-114_07
HS097:i5-007_H,i7-114_08
HS100:i5-008_A,i7-114_09
HS101:i5-008_B,i7-114_10
HS102:i5-008_C,i7-114_11
HS103:i5-008_D,i7-114_12
[names]
HS001:HS001
HS002:HS002
HS003:HS003
HS004:HS004
HS006:HS006
HS007:HS007
HS008:HS008
HS009:HS009
HS010:HS010
HS011:HS011
HS012:HS012
HS013:HS013
HS014:HS014
HS015:HS015
HS017:HS017
HS018:HS018
HS019:HS019
HS020:HS020
HS022:HS022
HS023:HS023
HS024:HS024
HS025:HS025
HS026:HS026
HS027:HS027
HS028:HS028
HS029:HS029
HS030:HS030
HS031:HS031
HS032:HS032
HS033:HS033
HS034:HS034
HS035:HS035
HS036:HS036
HS047:HS047
HS048:HS048
HS049:HS049
HS050:HS050
HS051:HS051
HS052:HS052
HS053:HS053
HS054:HS054
HS055:HS055
HS056:HS056
HS058:HS058
HS059:HS059
HS060:HS060
HS061:HS061
HS075:HS075
HS076:HS076
HS077:HS077
HS078:HS078
HS080:HS080
HS081:HS081
HS090:HS090
HS093:HS093
HS097:HS097
HS100:HS100
HS101:HS101
HS102:HS102
HS103:HS103
If I reduce your config file to:
[adapters]
i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT*GTGTAGATCTCGGTGGTCGCCGTATCATT
[tag sequences]
i5-001_A:GTCAACAG
i5-001_B:GTCAACAG
i7-110_01:GCACAACT
i7-110_02:TTCTCTCG
[tag map]
HS001:i5-001_A,i7-110_01
HS002:i5-001_B,i7-110_02
[names]
HS001:HS001
HS002:HS002
And run illumiprocessor against 4 files in raw-fastq
named (not your files - just some test data):
(phyluce) ~/T/illumiprocessor-test > ll raw-fastq
total 1792
-rw-r--r-- 1 bcf staff 221K Nov 19 13:49 HS001_R1_001.fastq.gz
-rw-r--r-- 1 bcf staff 220K Nov 19 13:49 HS001_R2_001.fastq.gz
-rw-r--r-- 1 bcf staff 221K Nov 19 13:50 HS002_R1_001.fastq.gz
-rw-r--r-- 1 bcf staff 220K Nov 19 13:50 HS002_R2_001.fastq.gz
With
illumiprocessor \
--input raw-fastq \
--output clean-fastq \
--config test.conf \
--cores 1 \
--r1-pattern "{}_R1_\d+.fastq.gz" \
--r2-pattern "{}_R2_\d+.fastq.gz" \
--log-path log
The code runs as expected with no errors. Maybe split things up into smaller batches and see if you can localize the problem by batch - I checked obvious stuff, but can't immediately find what might be wrong.
Hmmm... I did exactly what you have above but still erroring that way: Same result when trying it with just the last two samples HS102 and HS103.
Is it a naming error specifically, or could it also be an issue with the .fastq.gz files themselves?
(phyluce) tangled@tbc-comp1:~/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data$ illumiprocessor \
> --input raw-fastq/1st_2 \
> --output clean-fastq \
> --config illumiprocessor_HS_1st2.conf \
> --cores 12 \
> --r1-pattern "{}_R1_\d+.fastq.gz" \
> --r2-pattern "{}_R2_\d+.fastq.gz" \
> --log-path log
[WARNING] Output directory exists, REMOVE [Y/n]? Y
2020-11-19 15:19:06,892 - illumiprocessor - INFO - ==================== Starting illumiprocessor ===================
2020-11-19 15:19:06,892 - illumiprocessor - INFO - Version: 2.0.9
2020-11-19 15:19:06,893 - illumiprocessor - INFO - Argument --config: illumiprocessor_HS_1st2.conf
2020-11-19 15:19:06,893 - illumiprocessor - INFO - Argument --cores: 12
2020-11-19 15:19:06,893 - illumiprocessor - INFO - Argument --input: /home/tangled/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data/raw-fastq/1st_2
2020-11-19 15:19:06,893 - illumiprocessor - INFO - Argument --log_path: /home/tangled/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data/log
2020-11-19 15:19:06,894 - illumiprocessor - INFO - Argument --min_len: 40
2020-11-19 15:19:06,894 - illumiprocessor - INFO - Argument --no_merge: False
2020-11-19 15:19:06,894 - illumiprocessor - INFO - Argument --output: /home/tangled/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data/clean-fastq
2020-11-19 15:19:06,894 - illumiprocessor - INFO - Argument --phred: phred33
2020-11-19 15:19:06,895 - illumiprocessor - INFO - Argument --r1_pattern: {}_R1_\d+.fastq.gz
2020-11-19 15:19:06,895 - illumiprocessor - INFO - Argument --r2_pattern: {}_R2_\d+.fastq.gz
2020-11-19 15:19:06,895 - illumiprocessor - INFO - Argument --se: False
2020-11-19 15:19:06,896 - illumiprocessor - INFO - Argument --trimmomatic: /home/tangled/miniconda2/envs/phyluce/bin/trimmomatic
2020-11-19 15:19:06,896 - illumiprocessor - INFO - Argument --verbosity: INFO
Traceback (most recent call last):
File "/home/tangled/miniconda2/envs/phyluce/bin/illumiprocessor", line 17, in <module>
sys.exit(main())
File "/home/tangled/miniconda2/envs/phyluce/lib/python2.7/site-packages/illumiprocessor/cli/main.py", line 121, in main
main(args)
File "/home/tangled/miniconda2/envs/phyluce/lib/python2.7/site-packages/illumiprocessor/main.py", line 34, in main
reads.append(core.SequenceData(args, conf, start_name, end_name))
File "/home/tangled/miniconda2/envs/phyluce/lib/python2.7/site-packages/illumiprocessor/core.py", line 86, in __init__
self._get_read_data()
File "/home/tangled/miniconda2/envs/phyluce/lib/python2.7/site-packages/illumiprocessor/core.py", line 104, in _get_read_data
"errors in your conf file.".format(self.start_name))
IOError: There is a problem with the read names for HS001. Ensure you do not have spelling/capitalization errors in your conf file.
It seems to be having problems finding one of your two read files (I can't tell why). Basically, this error happens when it cannot find R1 and/or R2. Then it fires of the error message you see. Maybe check that all your filenames are absolutely correct between your config file and what's in the 1st_2
directory and that there are definitely both R1 and R2 files for the HS001 sample in the directory 1st_2
.
Thats what I was thinking, it not being able to 'find' the files, but I've double checked.
illumiprocessor \
--input raw-fastq/1st_2 \
--output clean-fastq \
--config illumiprocessor_HS_1st2.conf \
--cores 12 \
--r1-pattern "{}_R1_\d+.fastq.gz" \
--r2-pattern "{}_R2_\d+.fastq.gz" \
--log-path log
(phyluce) tangled@tbc-comp1:~/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data$ head -n20 illumiprocessor_HS_1st2.conf
[adapters]
i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT*GTGTAGATCTCGGTGGTCGCCGTATCATT
[tag sequences]
i5-001_A:GTCAACAG
i5-001_B:GTCAACAG
i7-110_01:GCACAACT
i7-110_02:TTCTCTCG
[tag map]
HS001:i5-001_A,i7-110_01
HS002:i5-001_B,i7-110_02
[names]
HS001:HS001
HS002:HS002
(phyluce) tangled@tbc-comp1:~/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data/raw-data/1st_2$ ls
HS001_R1_001.fastq.gz HS002_R1_001.fastq.gz
HS001_R2_001.fastq.gz HS002_R2_001.fastq.gz
MaxTBC, I'm having the same problem. How did you solve it? I appreciate any comments on that.
Thats what I was thinking, it not being able to 'find' the files, but I've double checked.
illumiprocessor \ --input raw-fastq/1st_2 \ --output clean-fastq \ --config illumiprocessor_HS_1st2.conf \ --cores 12 \ --r1-pattern "{}_R1_\d+.fastq.gz" \ --r2-pattern "{}_R2_\d+.fastq.gz" \ --log-path log
(phyluce) tangled@tbc-comp1:~/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data$ head -n20 illumiprocessor_HS_1st2.conf [adapters] i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT*GTGTAGATCTCGGTGGTCGCCGTATCATT [tag sequences] i5-001_A:GTCAACAG i5-001_B:GTCAACAG i7-110_01:GCACAACT i7-110_02:TTCTCTCG [tag map] HS001:i5-001_A,i7-110_01 HS002:i5-001_B,i7-110_02 [names] HS001:HS001 HS002:HS002
(phyluce) tangled@tbc-comp1:~/tbc/compute/UCE/HS/genewiz/30-412675317-101820/data/raw-data/1st_2$ ls HS001_R1_001.fastq.gz HS002_R1_001.fastq.gz HS001_R2_001.fastq.gz HS002_R2_001.fastq.gz
@MAUlyssea What does the config file that you are trying to use look like and what are the file names you are trying to process?
Hi! I'm trying to run illumiprocessor just with few fastq files first. My config file is like:
[adapters] i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCTCGTATGCCGTCTTCTGCTTG i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGTGTAGATCTCGGTGGTCGCCGTATCATT
[tag sequences] i7-iTru7_210_08:GCTGTAAG i7-iTru7_210_09:CGAATACG i7-iTru7_210_10:GCCAGAAT i7-iTru7_210_11:TCCTGGTA i5-iTru5_106_E:ACTCCTAC i5-iTru5_106_F:GTCTGAGT i5-iTru5_106_G:ACTGCACT i5-iTru5_106_H:GGAATGTC
[tag map] 00101:i7-iTru7_210_08,i5-iTru5_106_E 00201:i7-iTru7_210_09,i5-iTru5_106_F 00301:i7-iTru7_210_10,i5-iTru5_106_G 00401:i7-iTru7_210_11,i5-iTru5_106_H
[names] 00101:Hylomyrma_balzani_00101 00201:Hylomyrma_balzani_00201 00301:Hylomyrma_balzani_00301 00401:Hylomyrma_balzani_00401
and the raw-fastq files I'm trying to process are these: 001_01_R1_001.fastq.gz 001_01_R2_001.fastq.gz 002_01_R1_001.fastq.gz 002_01_R2_001.fastq.gz 003_01_R1_001.fastq.gz 003_01_R2_001.fastq.gz 004_01_R1_001.fastq.gz 004_01_R2_001.fastq.gz
You will need to adjust the regular expression to match your read names. That will look something like:
illumiprocessor \
--input <path-to-files> \
--output clean-fastq \
--config <config file>\
--cores 12 \
--r1-pattern "{}R1_\d+.fastq.gz" \
--r2-pattern "{}R2_\d+.fastq.gz" \
--log-path log
For specifying the r1 & r2 patterns I've tried: --r1-pattern "{}_R1\d+.fastq.gz" \ --r2-pattern "{}_R2\d+.fastq.gz" \
--r1-pattern "{}_R1_\d+.fastq.gz" \
--r2-pattern "{}_R2_\d+.fastq.gz" \
--r1-pattern "{}_R1_001\d+.fastq.gz" \
--r2-pattern "{}_R2_001\d+.fastq.gz" \
and all failed. I just tried the way you said and it worked!! But I didnt put the log specification.
I ran:
illumiprocessor \
--input
Thanks!!
You needed to structure it like I suggested because you included the second _
in your sample names (e.g. 001_01_
), and the sample name is what gets substituted into the squiggly brackets ({}
) in the pattern option. Glad you got it working.
Hello,
I have been dealing with a naming error that is similar to those above. The suggested solutions do not resolve this issue.
I have 40*2 = 80 total fastq.gz files located in the directory "working" that follow this structure
S703_L003_R1_001.fastq.gz
S703_L003_R2_001.fastq.gz
My configuration file is structured as:
[adapters]
i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AATGATACGGCGACCACCGAGATCTACAC*ACACTCTTTCCCTACACGACGCTCTTCCGATCT
[tag sequences]
i7-128:TTCGAAGC
i5-534:CGACGTTA
[tag map]
S703:i7-128,i5-534
[names]
S703:BME101020_Atorridus_KernCo_Caliente
My .sh file:
illumiprocessor \
--input working \
--output clean-fastq \
--config illumiprocessor_rev.conf \
--cores 20 \
--r1-pattern "{}R1_\d+.fastq.gz" \
--r2-pattern "{}R2_\d+.fastq.gz"
(I have tried with {}_R1_\d+.fastq.gz
and without the r1/r2 pattern flags as well)
The exact error I get:
File "/home/hays/miniconda3/envs/phyluce-1.7.1/lib/python3.6/site-packages/illumiprocessor/core.py", line 106, in _get_read_data "errors in your conf file.".format(self.start_name) OSError: There is a problem with the read names for S703. Ensure you do not have spelling/capitalization errors in your conf file.
Thank you for your help.
This has been discussed previously and I've read those posts, but am still having the following error:
My files are named as such: HS001_R1_001.fastq.gz HS001_R2_001.fastq.gz etc.
The .conf file:
I've tried modifying the "{}(?:.*)(R1|READ1|Read1|read1)_\d+.fastq(?:.gz)*" in #90 to the following based on the other posts (#90 , #96 , and #74 ).
Probably a simple fix, but I cant get it to work.
Thanks in advance, Max