faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
80 stars 49 forks source link

IOError: There is a problem with the read names #74

Closed JennaMcCullough closed 7 years ago

JennaMcCullough commented 7 years ago

Hello,

I'm having a similar problem to a closed post from April, except that this is a problem running my own data rather than the tutorial. This data has been de-multiplexed already.

My simplified illumiprocessor.conf file looks like this:

[adapters] i7:AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences] i5_15_B:AACCGTGT i7_105_12:ATGGTCCA

[tag map] Actenoides_bougainvilleiAMNHSKIN640159:i5_15_B,i7_105_12

[names] Actenoides_bougainvilleiAMNHSKIN640159:actenoides_bougainvilleiamnhskin640159

I used this code: illumiprocessor --input raw_data --output cleaned_reads --config illumiprocessor.conf --trimmomatic /home/vosea/anaconda2/jar/trimmomatic.jar --log-path logs --cores 12

I got this error message: IOError: There is a problem with the read names for Actenoides_bougainvilleiAMNHSKIN640159. Ensure you do not have spelling/capitalization errors in your conf file.

I have remade my conf file using the copied and pasted file names (to make sure I don't have spelling/capitalization errors) and I get the same error message. The names match the files so I am so lost as to why the error message says I have a grammatical error.

Any help is appreciated, thank you

mateusf commented 7 years ago

Hi Jenna,

How your reads are named? for example, my raw reads are named:

S1027_R1.fastq.gz S1027_R2.fastq.gz S1028_R1.fastq.gz S1028_R2.fastq.gz

so my conf file looks like this:

[adapters] i7:GATCGGAAGAGCACACGTCTGAACTCCAGTCACATCTCGTATGCCGTCTTCTGCTTG i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTGTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences] i7_N1:AACGTGAT i7_N2:AAACATCG ... i5_513:CTCCTTAC i5_514:TATGCAGT

[tag map] S1027:i7_N53,i5_503 S1028:i7_N46,i5_503

[names] S1027:ANSP_19305_Tro_mel S1028:LSU_B66106_Tro_cal

and then I'd use the same command line as you, but I include: --r1-pattern _R1 --r2-pattern _R2

This way illumiprocessor will know what are the patterns of each raw read, usually this will do the trick, but let us know if you still get the same error. Also, in your i5 adapter sequence, you are missing the *, I'm not sure if that can also cause problems too.

Best,

2017-07-03 17:47 GMT-04:00 Jenna McCullough notifications@github.com:

Hello,

I'm having a similar problem to a closed post from April https://github.com/faircloth-lab/phyluce/issues/65, except that this is a problem running my own data rather than the tutorial. This data has been de-multiplexed already.

My simplified illumiprocessor.conf file looks like this:

[adapters] i7:AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences] i5_15_B:AACCGTGT i7_105_12:ATGGTCCA

[tag map] Actenoides_bougainvilleiAMNHSKIN640159:i5_15_B,i7_105_12

[names] ActenoidesbougainvilleiAMNHSKIN640159:actenoides bougainvilleiamnhskin640159

I used this code: illumiprocessor --input raw_data --output cleaned_reads --config illumiprocessor.corac2.conf --trimmomatic /home/vosea/anaconda2/jar/trimmomatic.jar --log-path logs --cores 12

I got this error message: IOError: There is a problem with the read names for Actenoides_bougainvilleiAMNHSKIN640159. Ensure you do not have spelling/capitalization errors in your conf file.

I have remade my conf file using the copied and pasted file names (to make sure I don't have spelling/capitalization errors) and I get the same error message. The names match the files so I am so lost as to why the error message says I have a grammatical error.

Any help is appreciated, thank you

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/faircloth-lab/phyluce/issues/74, or mute the thread https://github.com/notifications/unsubscribe-auth/AJPIcXEzUavJeRl2VEmtKSMP09lwKLCRks5sKWFngaJpZM4OMtTn .

-- Mateus Ferreira Biólogo (CRBIO 73940/06-D) Doutorando em Genética - GCBEV/INPA Cel: +1 (917) 446-9209

JennaMcCullough commented 7 years ago

Oh, that worked. Thank you very much!

JennaMcCullough commented 7 years ago

Hi,

I have the same issue but slightly different dataset. For my outgroup, I was given data from another working group who sequenced through Rapid Genomics. This was sequenced using single-end rather than the paired-end data that I've processed before. I'm getting the same error message as I did above, but I'm unsure of how to specify the read pattern (which helped me solve the original problem).

C5C6YACXX_s2_1_RapidGenomics_15_SL70678.fastq.bz2

[adapters] i7:AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences] M89:AACGCTTA

[tag map] C5C6YACXX_s2_1_RapidGenomics_15_SL70678:M89

[names] C5C6YACXX_s2_1_RapidGenomics_15_SL70678:picoides_pubescens_ku7425

Thank you so much!

JennaMcCullough commented 7 years ago

I figured this out on my own. I didn't realize I needed to use the "--se" flag for single-end reads. Since I had one sample, I just used the whole name to specify R1.

luishdez35 commented 3 years ago

Hello, I have the same problem, I am new in all this bioinformatic analysis. The error message is this: 021-11-26 10:15:14,380 - illumiprocessor - INFO - ==================== Starting illumiprocessor =================== 2021-11-26 10:15:14,381 - illumiprocessor - INFO - Version: 2.0.9 2021-11-26 10:15:14,381 - illumiprocessor - INFO - Argument --config: illumiprocessor.conf 2021-11-26 10:15:14,381 - illumiprocessor - INFO - Argument --cores: 2 2021-11-26 10:15:14,381 - illumiprocessor - INFO - Argument --input: /media/luishdez/UCEs/PruebasNov/uce-tutorial/raw-fastq 2021-11-26 10:15:14,381 - illumiprocessor - INFO - Argument --log_path: None 2021-11-26 10:15:14,382 - illumiprocessor - INFO - Argument --min_len: 40 2021-11-26 10:15:14,382 - illumiprocessor - INFO - Argument --no_merge: False 2021-11-26 10:15:14,382 - illumiprocessor - INFO - Argument --output: /media/luishdez/UCEs/PruebasNov/uce-tutorial/clean-fastq 2021-11-26 10:15:14,382 - illumiprocessor - INFO - Argument --phred: phred33 2021-11-26 10:15:14,382 - illumiprocessor - INFO - Argument --r1_pattern: None 2021-11-26 10:15:14,382 - illumiprocessor - INFO - Argument --r2_pattern: None 2021-11-26 10:15:14,382 - illumiprocessor - INFO - Argument --se: False 2021-11-26 10:15:14,383 - illumiprocessor - INFO - Argument --trimmomatic: /home/luishdez/miniconda3/envs/py2/bin/trimmomatic 2021-11-26 10:15:14,383 - illumiprocessor - INFO - Argument --verbosity: INFO Traceback (most recent call last): File "/home/luishdez/miniconda3/envs/py2/bin/illumiprocessor", line 17, in sys.exit(main()) File "/home/luishdez/miniconda3/envs/py2/lib/python2.7/site-packages/illumiprocessor/cli/main.py", line 121, in main main(args) File "/home/luishdez/miniconda3/envs/py2/lib/python2.7/site-packages/illumiprocessor/main.py", line 34, in main reads.append(core.SequenceData(args, conf, start_name, end_name)) File "/home/luishdez/miniconda3/envs/py2/lib/python2.7/site-packages/illumiprocessor/core.py", line 86, in init self._get_read_data() File "/home/luishdez/miniconda3/envs/py2/lib/python2.7/site-packages/illumiprocessor/core.py", line 104, in _get_read_data "errors in your conf file.".format(self.start_name)) IOError: There is a problem with the read names for DGF0448. Ensure you do not have spelling/capitalization errors in your conf file.

And my conf file are like this

this is the section where you list the adapters you used. the asterisk

will be replaced with the appropriate index for the sample.

[adapters] i7:AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

this is the list of indexes we used

[tag sequences] DGF0448:CGAAGAACGGTTGTCA DGF0499:CGAAGAACACGGAACA DGF0555:TGTGACTGACAGCTCA DGF0812:AGGTTCGAGTGGTGTT DGF0944:AAGAAGGC*ACGGAACA

this is how each index maps to each set of reads

[tag map] DGF0448:DGF0448 DGF0499:DGF0499 DGF0555:DGF0555 DGF0812:DGF0812 DGF0944:DGF0944

we want to rename our read files something a bit more nice - so we will

rename Alligator_mississippiensis_GGAGCTATGG to alligator_mississippiensis

[names] DGF0448:Paraphidippus aurantius DGF0499:Phidippus cruentus DGF0555:Phidippus cuentus sp2 DGF0812:Phidippus bidentatus DGF0944:Phidippus zethus

Thanks a lot for youre help

Best