Closed amesclir closed 2 years ago
Can you show me the first handful of lines from one of your fastq files? e.g. zcat /home/avaldes/floragenex2020/plateA_edits/PlateA_A11.trimmed_R1_.fastq.gz | head -n 8
First at all, Thank you very much Isaac Overcast! There nothing when I do this [avaldes@pool plateA_edits]$ zcat /home/avaldes/floragenex2020/plateA_edits/PlateA_A11.trimmedR1.fastq.gz | head -n 8 [avaldes@pool plateA_edits]$ Those are empty. But this is from a fastq file from step 1 [avaldes@pool plateA_edits]$ zcat /home/avaldes/floragenex2020/plateA_fastqs/PlateA_A01R1.fastq.gz | head -n 8 TTCTTCAGACTGCAGTTTCCACCCCTGCTAATAGTTTAGCACATTCGTTGTTGTTACTGTGGGGTCCTGAAGCACAAGGGGATTTTACTCGTTGGTGTCAATTAGGCGGTCTGTGGACTTTTGTTGCTCTCCATGGCGA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:24424:30702 1:N:0:1 AAGGGGGAAGGGATCGCGCCTTCCCTATTCCGGGCCATCCCCGAATCCAAGATCTACATCCCTATTTTAACTAACCAGTATGCTTCCAGCAAATGGTGCCTTCAAGAATTGGCTCAAATGGTGA:2FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:24442:30702 1:N:0:1 CTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACTAAAGCCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGCCCCTTACTTA:2FFFFFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF:FFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F,FFFF:F:F1F:FCFFFFFF:F105F FFFFF241:24460:30702 1:N:0:1 AGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGAA:2FFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:F,:F,FFF,F::FFFFF:FFFFFFFFFFFFF:,FFFFFF:::,FFFF:FFF,FFFFFF,,FFFF,,F:FFFF,FFF:FF::,FFF::FFF,,FFFF:F:FF,FF:F1F:FCFFFFFF:F105F FFFFF241:24514:30702 1:N:0:1 GCACGTCTCCTGCAGTGCATATTGTCCTTGGAGCGGTAAAATTGCAGAATTAACCAGGATAATGTCACCTGTCATTAGCATCTTGAGCTTGCCTATTGATAATTAAAAGTTTCTGAAATTGATTTCCCTTCTTTTCTTA:2F::FFFFFFFF:,FFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:,F,FFFFFFFFFFFFFFF,FFFF:F1F:FCFFFFFF:F105F FFFFF241:26250:30702 1:N:0:1 CAGAAAGCTAACCTGCTGCATGTGTCAAAGGCTAGCTAAGGAACTAGGAAGGAGACCTAGCTAGCTTGCCCAACACTCCTCACAGAATTTAAACAAGAAACATATACATATCAGTACAAAAAAA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:26268:30702 1:N:0:1 TTCTGGAATATGCAGCACCTCCAGAGCCGTTCCATTTACCGGAACCCTAACTTGATTGTTGCATCCTCCCAACCAGTTCCTAACTGCGGGTTCATATTGGCCGAGTATGCTCTCCGTCATCGGCAGAAAGAATCCCCTA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFF,FFFFF:FFFFFFFFFFFFFF:F,FFF:FFFFFFFFFFF:F,F::FFFFFF::F::FFFFFFFFFFF::FFFF:FFFFFFFFFFFF,F:F1F:FCFFFFFF:F105F FFFFF241:26286:30702 1:N:0:1 AAATTGAAAACAGTGGAGCAAATGGATTTTTTGATCTGCCTGAAGATGATCCGCTTGCATTCCTTACAAGTTTTGGAGCATCCGAGTTCATTAGTCTGCCAATGTCCGAGTGTAATGAAGACTA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:26304:30702 1:N:0:1 [avaldes@pool plateA_edits]$
Thanks!
Marcial
Yes, well it seems the fastq file format has been messed up. It's supposed to look like this:
@lane1_locus0_2G_0_R1_0 1:N:0:
GAGGAGTGCAGCCCCTATGTGTCCGGCACCCCAACGCCTTGGAACTCAGTTAACTGTTCAAGTTGGGCAAGATCAAGTCGTCCCCTTAGCCCCCGCTCCG
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@lane1_locus0_2G_0_R1_1 1:N:0:
GAGGAGTGCAGCCCCTATGTGTCCGGCACCCCAACGCCTTGGAACTCAGTTAACTGTTCAAGTTGGGCAAGATCAAGTCGTCCCCTTAGCCCCCGCTCCG
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
@lane1_locus0_2G_0_R1_2 1:N:0:
GAGGAGTGCAGCCCCTATGTGTCCGGCACCCCAACGCCTTGGAACTCAGTTAACTGTTCAAGTTGGGCAAGATCAAGTCGTCCCCTTAGCCCCCGCTCCG
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
Can you show me the first bit of your raw data? Did you demultiplex with ipyrad? I've never seen this before. What version of ipyrad are you using? ipyrad -v
This is the raw data
[avaldes@pool plateA_edits]$ zcat /home/avaldes/floragenex2020/raw/plateA.fastq.gz | head -n 8
@A01335:100:HVMCHDRXY:1:2101:1090:1000 1:N:0:1
NCGCCGCAATTGCAGAAGAATACAGATCCTTGATGGATACTCTGGCAACTGGTAGGCTTTCACTGGTCATTGATGTATTGAGGCGATTGAAGGTATGGCTTCCTTTGAGTTCTTAATGCAGATATGTTTGGTGCACGC
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFFFFF,FFF,FFF:FFFFFFFFFFFFFFFF::FFFFFFF:FFFFFF:FFFFFFFFFFF,:,F,F:FFFFFFFFF
@A01335:100:HVMCHDRXY:1:2101:1127:1000 1:N:0:1
NGAGCCAGCTTGCAGCATCTCCAGCCAAGCTTCTCGAGCGACTCATCAAGGAGAGCCGTGCTGAGGCTCGCTACAACATTCTCTCGGCGATGGCCAACCAAGTCCGCTTCGTCAACTCGCATACGCACTACTTCTCCA
+
#FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF
[avaldes@pool plateA_edits]$
Which look good, right?
I have done demultiplexing with ipyrad using 0.9.65
Thanks!
yes that looks fine. something is definitely messed up, maybe an old cutadapt version? Can you try installing ipyrad in a new clean conda env and trying it again?
conda create -n clean_ipyrad
conda activate clean_ipyrad
conda install -c conda-forge -c bioconda ipyrad -y
Yes! Of course!
Dear Isaac Overcast,
Still the same problem. The people in charge of the cluster we are using have reinstall ipyrad (now ver. 0.981) but still the same problem.
[avaldes@pool plateA_edits]$ zcat /home/avaldes/floragenex2020/plateA_fastqs/PlateA_A01R1.fastq.gz | head -n 8 TTCTTCAGACTGCAGTTTCCACCCCTGCTAATAGTTTAGCACATTCGTTGTTGTTACTGTGGGGTCCTGAAGCACAAGGGGATTTTACTCGTTGGTGTCAATTAGGCGGTCTGTGGACTTTTGTTGCTCTCCATGGCGA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:24424:30702 1:N:0:1 AAGGGGGAAGGGATCGCGCCTTCCCTATTCCGGGCCATCCCCGAATCCAAGATCTACATCCCTATTTTAACTAACCAGTATGCTTCCAGCAAATGGTGCCTTCAAGAATTGGCTCAAATGGTGA:2FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:24442:30702 1:N:0:1 CTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACTAAAGCCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGCCCCTTACTTA:2FFFFFFFFFF:FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF:FFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F,FFFF:F:F1F:FCFFFFFF:F105F FFFFF241:24460:30702 1:N:0:1 AGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGAA:2FFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:F,:F,FFF,F::FFFFF:FFFFFFFFFFFFF:,FFFFFF:::,FFFF:FFF,FFFFFF,,FFFF,,F:FFFF,FFF:FF::,FFF::FFF,,FFFF:F:FF,FF:F1F:FCFFFFFF:F105F FFFFF241:24514:30702 1:N:0:1 GCACGTCTCCTGCAGTGCATATTGTCCTTGGAGCGGTAAAATTGCAGAATTAACCAGGATAATGTCACCTGTCATTAGCATCTTGAGCTTGCCTATTGATAATTAAAAGTTTCTGAAATTGATTTCCCTTCTTTTCTTA:2F::FFFFFFFF:,FFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:,F,FFFFFFFFFFFFFFF,FFFF:F1F:FCFFFFFF:F105F FFFFF241:26250:30702 1:N:0:1 CAGAAAGCTAACCTGCTGCATGTGTCAAAGGCTAGCTAAGGAACTAGGAAGGAGACCTAGCTAGCTTGCCCAACACTCCTCACAGAATTTAAACAAGAAACATATACATATCAGTACAAAAAAA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:26268:30702 1:N:0:1 TTCTGGAATATGCAGCACCTCCAGAGCCGTTCCATTTACCGGAACCCTAACTTGATTGTTGCATCCTCCCAACCAGTTCCTAACTGCGGGTTCATATTGGCCGAGTATGCTCTCCGTCATCGGCAGAAAGAATCCCCTA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFF,FFFFF:FFFFFFFFFFFFFF:F,FFF:FFFFFFFFFFF:F,F::FFFFFF::F::FFFFFFFFFFF::FFFF:FFFFFFFFFFFF,F:F1F:FCFFFFFF:F105F FFFFF241:26286:30702 1:N:0:1 AAATTGAAAACAGTGGAGCAAATGGATTTTTTGATCTGCCTGAAGATGATCCGCTTGCATTCCTTACAAGTTTTGGAGCATCCGAGTTCATTAGTCTGCCAATGTCCGAGTGTAATGAAGACTA:2FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F1F:FCFFFFFF:F105F FFFFF241:26304:30702 1:N:0:1 [avaldes@pool plateA_edits]$
Any idea of the problem?
Thanks!!!!
That is bizarre. Can you wetransfer me a big chunk of the raw data? Here's how to get the first 200000 lines:
zcat raw_fastq.gz | head -n 200000 > subset_raw.fastq
gzip subset_raw.fastq
I really can't imagine what's happening here.
Sent! by email! Did you got it? Thanks!
Yes, I got it. I am a little busy today so i might not get to look at it until later. I will let you know. -isaac
On Mon, Jan 24, 2022 at 8:26 AM Marcial Escudero @.***> wrote:
Sent! by email! Did you got it? Thanks!
— Reply to this email directly, view it on GitHub https://github.com/dereneaton/ipyrad/issues/470#issuecomment-1020099441, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNSXPYJYL7XFJOVCAPPENTUXVHPLANCNFSM5MEUKJRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you commented.Message ID: @.***>
Thanks!
Can you send your barcodes file and also your params file? thx.
This is very weird. It works fine for me. Your data looks good after step 1 demultiplexing, and it still looks good after step 2 trimming. It also completes step 3 without crashing. It doesn't make any sense. The way the formatting is getting messed up is bizarre, it's like retaining some of the data, but mangling it. It almost feels like a character encoding thing? Like a unicode terminal interpreting characters all funny? What can you tell me about the cluster you're on? Can you show me the results of this: mount
and this env
? These will both dump a lot of wild information, but i'd like to see it all. Maybe email it to me instead of posting it here.
Alternatively, have you tried just running the ipyrad assembly on another computer? It's not a super heavy operation, and I've seen people run assemblies on laptops with not too much trouble. It might be easier just to avoid this problem (which I have never seen before and is almost certainly a local issue with the cluster) than to try to figure it out and fix it. I'm happy to help either way.
Thanks! I will send this information to the guys that handle the cluster.
Always happy to help! ☺
From: Marcial Escudero @.> Sent: Tuesday, January 25, 2022 9:31 AM To: dereneaton/ipyrad @.> Cc: Subscribed @.***> Subject: Re: [dereneaton/ipyrad] Error in FASTQ file at line 1: Line expected to start with \'@\', but found \'C\'\n') (Issue #470)
Thanks! I will send this information to the guys that handle the cluster.
— Reply to this email directly, view it on GitHubhttps://github.com/dereneaton/ipyrad/issues/470#issuecomment-1020928280, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKDNREYYVTXMBWCZ2YC67ULUXZNVZANCNFSM5MEUKJRA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you are subscribed to this thread.Message ID: @.***>
OK! fixed!!! The raw data was corrupted when downloaded! Now it is fixed!
Glad you got it figured out! Good luck with the rest of the analysis!
I have the same error for all the samples: