dereneaton / ipyrad

Interactive assembly and analysis of RAD-seq data sets
http://ipyrad.readthedocs.io
GNU General Public License v3.0
70 stars 39 forks source link

Error in FASTQ file at line 1: Line expected to start with \'@\', but found \'C\'\n') #470

Closed amesclir closed 2 years ago

amesclir commented 2 years ago

I have the same error for all the samples:

IPyradError( error in cutadapt -a AGATCGGAAGAGC --quality-base 33 -q 20 -u 5 --minimum-length 35 --max-n 4 --trim-n --output /home/avaldes/floragenex2020/plateA_edits/PlateA_A11.trimmed_R1_.fastq.gz /home/avaldes/floragenex2020/plateA_fastqs/PlateA_A11_R1_.fastq.gz b'This is cutadapt 3.2 with Python 3.8.6\nCommand line parameters: -a AGATCGGAAGAGC --quality-base 33 -q 20 -u 5 --minimum-length 35 --max-n 4 --trim-n --output /home/avaldes/floragenex2020/plateA_edits/PlateA_A11.trimmed_R1_.fastq.gz /home/avaldes/floragenex2020/plateA_fastqs/PlateA_A11_R1_.fastq.gz\nRun "cutadapt --help" to see command-line options.\nSee https://cutadapt.readthedocs.io/ for full documentation.\n\ncutadapt: error: Error in FASTQ file at line 1: Line expected to start with \'@\', but found \'C\'\n') It like something has changed in cutadapt but not in ipyrad? Thanks!!!
isaacovercast commented 2 years ago

Can you show me the first handful of lines from one of your fastq files? e.g. zcat /home/avaldes/floragenex2020/plateA_edits/PlateA_A11.trimmed_R1_.fastq.gz | head -n 8

amesclir commented 2 years ago

First at all, Thank you very much Isaac Overcast! There nothing when I do this [avaldes@pool plateA_edits]$ zcat /home/avaldes/floragenex2020/plateA_edits/PlateA_A11.trimmedR1.fastq.gz | head -n 8 [avaldes@pool plateA_edits]$ Those are empty. But this is from a fastq file from step 1 [avaldes@pool plateA_edits]$ zcat /home/avaldes/floragenex2020/plateA_fastqs/PlateA_A01R1.fastq.gz | head -n 8 TTCTTCAGACTGCAGTTTCCACCCCTGCTAATAGTTTAGCACATTCGTTGTTGTTACTGTGGGGTCCTGAAGCACAAGGGGATTTTACTCGTTGGTGTCAATTAGGCGGTCTGTGGACTTTTGTTGCTCTCCATGGCGA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:24424:30702 1:N:0:1 AAGGGGGAAGGGATCGCGCCTTCCCTATTCCGGGCCATCCCCGAATCCAAGATCTACATCCCTATTTTAACTAACCAGTATGCTTCCAGCAAATGGTGCCTTCAAGAATTGGCTCAAATGGTGA:2FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:24442:30702 1:N:0:1 CTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACTAAAGCCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGCCCCTTACTTA:2FFFFFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF:FFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F,FFFF:F:F1F:FCFFFFFF:F105F FFFFF241:24460:30702 1:N:0:1 AGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGAA:2FFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:F,:F,FFF,F::FFFFF:FFFFFFFFFFFFF:,FFFFFF:::,FFFF:FFF,FFFFFF,,FFFF,,F:FFFF,FFF:FF::,FFF::FFF,,FFFF:F:FF,FF:F1F:FCFFFFFF:F105F FFFFF241:24514:30702 1:N:0:1 GCACGTCTCCTGCAGTGCATATTGTCCTTGGAGCGGTAAAATTGCAGAATTAACCAGGATAATGTCACCTGTCATTAGCATCTTGAGCTTGCCTATTGATAATTAAAAGTTTCTGAAATTGATTTCCCTTCTTTTCTTA:2F::FFFFFFFF:,FFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:,F,FFFFFFFFFFFFFFF,FFFF:F1F:FCFFFFFF:F105F FFFFF241:26250:30702 1:N:0:1 CAGAAAGCTAACCTGCTGCATGTGTCAAAGGCTAGCTAAGGAACTAGGAAGGAGACCTAGCTAGCTTGCCCAACACTCCTCACAGAATTTAAACAAGAAACATATACATATCAGTACAAAAAAA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:26268:30702 1:N:0:1 TTCTGGAATATGCAGCACCTCCAGAGCCGTTCCATTTACCGGAACCCTAACTTGATTGTTGCATCCTCCCAACCAGTTCCTAACTGCGGGTTCATATTGGCCGAGTATGCTCTCCGTCATCGGCAGAAAGAATCCCCTA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFF,FFFFF:FFFFFFFFFFFFFF:F,FFF:FFFFFFFFFFF:F,F::FFFFFF::F::FFFFFFFFFFF::FFFF:FFFFFFFFFFFF,F:F1F:FCFFFFFF:F105F FFFFF241:26286:30702 1:N:0:1 AAATTGAAAACAGTGGAGCAAATGGATTTTTTGATCTGCCTGAAGATGATCCGCTTGCATTCCTTACAAGTTTTGGAGCATCCGAGTTCATTAGTCTGCCAATGTCCGAGTGTAATGAAGACTA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:26304:30702 1:N:0:1 [avaldes@pool plateA_edits]$

Thanks!

Marcial

isaacovercast commented 2 years ago

Yes, well it seems the fastq file format has been messed up. It's supposed to look like this:

@lane1_locus0_2G_0_R1_0 1:N:0:
GAGGAGTGCAGCCCCTATGTGTCCGGCACCCCAACGCCTTGGAACTCAGTTAACTGTTCAAGTTGGGCAAGATCAAGTCGTCCCCTTAGCCCCCGCTCCG
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@lane1_locus0_2G_0_R1_1 1:N:0:
GAGGAGTGCAGCCCCTATGTGTCCGGCACCCCAACGCCTTGGAACTCAGTTAACTGTTCAAGTTGGGCAAGATCAAGTCGTCCCCTTAGCCCCCGCTCCG
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@lane1_locus0_2G_0_R1_2 1:N:0:
GAGGAGTGCAGCCCCTATGTGTCCGGCACCCCAACGCCTTGGAACTCAGTTAACTGTTCAAGTTGGGCAAGATCAAGTCGTCCCCTTAGCCCCCGCTCCG
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

Can you show me the first bit of your raw data? Did you demultiplex with ipyrad? I've never seen this before. What version of ipyrad are you using? ipyrad -v

amesclir commented 2 years ago

This is the raw data

[avaldes@pool plateA_edits]$ zcat /home/avaldes/floragenex2020/raw/plateA.fastq.gz | head -n 8

@A01335:100:HVMCHDRXY:1:2101:1090:1000 1:N:0:1
NCGCCGCAATTGCAGAAGAATACAGATCCTTGATGGATACTCTGGCAACTGGTAGGCTTTCACTGGTCATTGATGTATTGAGGCGATTGAAGGTATGGCTTCCTTTGAGTTCTTAATGCAGATATGTTTGGTGCACGC
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFFFFF,FFF,FFF:FFFFFFFFFFFFFFFF::FFFFFFF:FFFFFF:FFFFFFFFFFF,:,F,F:FFFFFFFFF
@A01335:100:HVMCHDRXY:1:2101:1127:1000 1:N:0:1
NGAGCCAGCTTGCAGCATCTCCAGCCAAGCTTCTCGAGCGACTCATCAAGGAGAGCCGTGCTGAGGCTCGCTACAACATTCTCTCGGCGATGGCCAACCAAGTCCGCTTCGTCAACTCGCATACGCACTACTTCTCCA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF
[avaldes@pool plateA_edits]$ 

Which look good, right?

I have done demultiplexing with ipyrad using 0.9.65

Thanks!

isaacovercast commented 2 years ago

yes that looks fine. something is definitely messed up, maybe an old cutadapt version? Can you try installing ipyrad in a new clean conda env and trying it again?

conda create -n clean_ipyrad
conda activate clean_ipyrad
conda install -c conda-forge -c bioconda ipyrad -y
amesclir commented 2 years ago

Yes! Of course!

amesclir commented 2 years ago

Dear Isaac Overcast,

Still the same problem. The people in charge of the cluster we are using have reinstall ipyrad (now ver. 0.981) but still the same problem.

[avaldes@pool plateA_edits]$ zcat /home/avaldes/floragenex2020/plateA_fastqs/PlateA_A01R1.fastq.gz | head -n 8 TTCTTCAGACTGCAGTTTCCACCCCTGCTAATAGTTTAGCACATTCGTTGTTGTTACTGTGGGGTCCTGAAGCACAAGGGGATTTTACTCGTTGGTGTCAATTAGGCGGTCTGTGGACTTTTGTTGCTCTCCATGGCGA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:24424:30702 1:N:0:1 AAGGGGGAAGGGATCGCGCCTTCCCTATTCCGGGCCATCCCCGAATCCAAGATCTACATCCCTATTTTAACTAACCAGTATGCTTCCAGCAAATGGTGCCTTCAAGAATTGGCTCAAATGGTGA:2FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:24442:30702 1:N:0:1 CTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACTAAAGCCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGCCCCTTACTTA:2FFFFFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF:FFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F,FFFF:F:F1F:FCFFFFFF:F105F FFFFF241:24460:30702 1:N:0:1 AGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGAA:2FFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:F,:F,FFF,F::FFFFF:FFFFFFFFFFFFF:,FFFFFF:::,FFFF:FFF,FFFFFF,,FFFF,,F:FFFF,FFF:FF::,FFF::FFF,,FFFF:F:FF,FF:F1F:FCFFFFFF:F105F FFFFF241:24514:30702 1:N:0:1 GCACGTCTCCTGCAGTGCATATTGTCCTTGGAGCGGTAAAATTGCAGAATTAACCAGGATAATGTCACCTGTCATTAGCATCTTGAGCTTGCCTATTGATAATTAAAAGTTTCTGAAATTGATTTCCCTTCTTTTCTTA:2F::FFFFFFFF:,FFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:,F,FFFFFFFFFFFFFFF,FFFF:F1F:FCFFFFFF:F105F FFFFF241:26250:30702 1:N:0:1 CAGAAAGCTAACCTGCTGCATGTGTCAAAGGCTAGCTAAGGAACTAGGAAGGAGACCTAGCTAGCTTGCCCAACACTCCTCACAGAATTTAAACAAGAAACATATACATATCAGTACAAAAAAA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:26268:30702 1:N:0:1 TTCTGGAATATGCAGCACCTCCAGAGCCGTTCCATTTACCGGAACCCTAACTTGATTGTTGCATCCTCCCAACCAGTTCCTAACTGCGGGTTCATATTGGCCGAGTATGCTCTCCGTCATCGGCAGAAAGAATCCCCTA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFF,FFFFF:FFFFFFFFFFFFFF:F,FFF:FFFFFFFFFFF:F,F::FFFFFF::F::FFFFFFFFFFF::FFFF:FFFFFFFFFFFF,F:F1F:FCFFFFFF:F105F FFFFF241:26286:30702 1:N:0:1 AAATTGAAAACAGTGGAGCAAATGGATTTTTTGATCTGCCTGAAGATGATCCGCTTGCATTCCTTACAAGTTTTGGAGCATCCGAGTTCATTAGTCTGCCAATGTCCGAGTGTAATGAAGACTA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:26304:30702 1:N:0:1 [avaldes@pool plateA_edits]$

Any idea of the problem?

Thanks!!!!

isaacovercast commented 2 years ago

That is bizarre. Can you wetransfer me a big chunk of the raw data? Here's how to get the first 200000 lines:

zcat raw_fastq.gz | head -n 200000 > subset_raw.fastq
gzip subset_raw.fastq

I really can't imagine what's happening here.

amesclir commented 2 years ago

Sent! by email! Did you got it? Thanks!

isaacovercast commented 2 years ago

Yes, I got it. I am a little busy today so i might not get to look at it until later. I will let you know. -isaac

On Mon, Jan 24, 2022 at 8:26 AM Marcial Escudero @.***> wrote:

Sent! by email! Did you got it? Thanks!

— Reply to this email directly, view it on GitHub https://github.com/dereneaton/ipyrad/issues/470#issuecomment-1020099441, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNSXPYJYL7XFJOVCAPPENTUXVHPLANCNFSM5MEUKJRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

amesclir commented 2 years ago

Thanks!

isaacovercast commented 2 years ago

Can you send your barcodes file and also your params file? thx.

isaacovercast commented 2 years ago

This is very weird. It works fine for me. Your data looks good after step 1 demultiplexing, and it still looks good after step 2 trimming. It also completes step 3 without crashing. It doesn't make any sense. The way the formatting is getting messed up is bizarre, it's like retaining some of the data, but mangling it. It almost feels like a character encoding thing? Like a unicode terminal interpreting characters all funny? What can you tell me about the cluster you're on? Can you show me the results of this: mount and this env? These will both dump a lot of wild information, but i'd like to see it all. Maybe email it to me instead of posting it here.

Alternatively, have you tried just running the ipyrad assembly on another computer? It's not a super heavy operation, and I've seen people run assemblies on laptops with not too much trouble. It might be easier just to avoid this problem (which I have never seen before and is almost certainly a local issue with the cluster) than to try to figure it out and fix it. I'm happy to help either way.

amesclir commented 2 years ago

Thanks! I will send this information to the guys that handle the cluster.

philipphuehn commented 2 years ago

Always happy to help! ☺

From: Marcial Escudero @.> Sent: Tuesday, January 25, 2022 9:31 AM To: dereneaton/ipyrad @.> Cc: Subscribed @.***> Subject: Re: [dereneaton/ipyrad] Error in FASTQ file at line 1: Line expected to start with \'@\', but found \'C\'\n') (Issue #470)

Thanks! I will send this information to the guys that handle the cluster.

— Reply to this email directly, view it on GitHubhttps://github.com/dereneaton/ipyrad/issues/470#issuecomment-1020928280, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKDNREYYVTXMBWCZ2YC67ULUXZNVZANCNFSM5MEUKJRA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you are subscribed to this thread.Message ID: @.***>

amesclir commented 2 years ago

OK! fixed!!! The raw data was corrupted when downloaded! Now it is fixed!

isaacovercast commented 2 years ago

Glad you got it figured out! Good luck with the rest of the analysis!