OpenGene / fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
MIT License
1.9k stars 332 forks source link

ERROR: sequence and quality have different length: #340

Open lubocoix opened 3 years ago

lubocoix commented 3 years ago

when i use the code "fastp -i /Users/lubo/YY17/CoixCoix_L766-01-T16_good_1.fq -I /Users/lubo/YY17/CoixCoix_L766-01-T16_good_2.fq -o /Users/lubo/YY17/Coix_T16_1.fq -O /Users/lubo/YY17/Coix_T16_2.fq" to deal with my data. It show that "ERROR: sequence and quality have different length: @A00224:118:HC7HLDSXX:4:1101:17716:1204 2:N:0:GAATTCGT+TATAGCCT AGA@A00224:118:HC7HLDSXX:4:1101:31286:1000 2:N:0:GAATTCGT+TATAGCCT NCAAGATCTTGTGAAAGAGAGGCTGCCCAGGTTCACTCCTGAGCAAGCTAAAATGGTCAAGGGCTCGGCAGACTACATTGGTATCAATGAATATACATCCAGCTACATGAAGGGACAGAAACTGGTCCAGCAGACTCCCAGTAGCTACTC + ERROR: sequence and quality have different length: @A00224:118:HC7HLDSXX:4:1101:17716:1204 2:N:0:GAATTCGT+TATAGCCT AGA@A00224:118:HC7HLDSXX:4:1101:31286:1000 2:N:0:GAATTCGT+TATAGCCT NCAAGATCTTGTGAAAGAGAGGCTGCCCAGGTTCACTCCTGAGCAAGCTAAAATGGTCAAGGGCTCGGCAGACTACATTGGTATCAATGAATATACATCCAGCTACATGAAGGGACAGAAACTGGTCCAGCAGACTCCCAGTAGCTACTC +" I don't understand what's wrong with my RNA-seq data. My sequence is provided by my professor, I also try Trimmomatic, shows the same error,I wonder if something is wrong with my fastq data?

VoronDM commented 2 years ago

I have the same error after which fastp halt without output.

vk@GenomeBiology ~/work/Experiment/dval
$ fastp -i /Seq/5_38_1.fastq.gz -I /Seq/5_38_2.fastq.gz -o out.R1.fq.gz -O out.R2.fq.gz -w 16

ERROR: sequence and quality have different length:
@MG00HS13:501:C5DGRACXX:5:2302:19240:89393 2:N:0:
TCGGATGTGCTTGGCTTTTGTAACGTTCAATTTGGCTTACTTGCGTGATTTTGCATTAACAGCTCAAATACAGCTAACTTTGGAAATGAACACGATGGAAA
+
BCCFDFFEHHHHHJIJJJJJIIJJJIJJJIIJJJJIJJJIJJJIIHIIIJFHIIIJJJJJJJJJGTAACG?BCTTAAC45TGGCTTCTATTT7@EEGIIIIIIIGCTGHTHTHTHTHE7)?=?A########TTGTTGCC

WARNNIG: different read numbers of the 187896 pack
Read1 pack size: 1000
Read2 pack size: 302
qiangfan2022 commented 2 years ago

I have the same error after which fastp halt without output.

vk@GenomeBiology ~/work/Experiment/dval
$ fastp -i /Seq/5_38_1.fastq.gz -I /Seq/5_38_2.fastq.gz -o out.R1.fq.gz -O out.R2.fq.gz -w 16

ERROR: sequence and quality have different length:
@MG00HS13:501:C5DGRACXX:5:2302:19240:89393 2:N:0:
TCGGATGTGCTTGGCTTTTGTAACGTTCAATTTGGCTTACTTGCGTGATTTTGCATTAACAGCTCAAATACAGCTAACTTTGGAAATGAACACGATGGAAA
+
BCCFDFFEHHHHHJIJJJJJIIJJJIJJJIIJJJJIJJJIJJJIIHIIIJFHIIIJJJJJJJJJGTAACG?BCTTAAC45TGGCTTCTATTT7@EEGIIIIIIIGCTGHTHTHTHTHE7)?=?A########TTGTTGCC

WARNNIG: different read numbers of the 187896 pack
Read1 pack size: 1000
Read2 pack size: 302

Hello, have you resolve the error?

qiangfan2022 commented 2 years ago

when i use the code "fastp -i /Users/lubo/YY17/CoixCoix_L766-01-T16_good_1.fq -I /Users/lubo/YY17/CoixCoix_L766-01-T16_good_2.fq -o /Users/lubo/YY17/Coix_T16_1.fq -O /Users/lubo/YY17/Coix_T16_2.fq" to deal with my data. It show that "ERROR: sequence and quality have different length: @a00224:118:HC7HLDSXX:4:1101:17716:1204 2:N:0:GAATTCGT+TATAGCCT AGA@A00224:118:HC7HLDSXX:4:1101:31286:1000 2:N:0:GAATTCGT+TATAGCCT NCAAGATCTTGTGAAAGAGAGGCTGCCCAGGTTCACTCCTGAGCAAGCTAAAATGGTCAAGGGCTCGGCAGACTACATTGGTATCAATGAATATACATCCAGCTACATGAAGGGACAGAAACTGGTCCAGCAGACTCCCAGTAGCTACTC + ERROR: sequence and quality have different length: @a00224:118:HC7HLDSXX:4:1101:17716:1204 2:N:0:GAATTCGT+TATAGCCT AGA@A00224:118:HC7HLDSXX:4:1101:31286:1000 2:N:0:GAATTCGT+TATAGCCT NCAAGATCTTGTGAAAGAGAGGCTGCCCAGGTTCACTCCTGAGCAAGCTAAAATGGTCAAGGGCTCGGCAGACTACATTGGTATCAATGAATATACATCCAGCTACATGAAGGGACAGAAACTGGTCCAGCAGACTCCCAGTAGCTACTC +" I don't understand what's wrong with my RNA-seq data. My sequence is provided by my professor, I also try Trimmomatic, shows the same error,I wonder if something is wrong with my fastq data?

Hello, have you resolve the error?

KristinaGagalova commented 2 years ago

Hi I have the same error, is there any chance you could exclude those reads instead of just throwing the error? Thank you in advance

VoronDM commented 2 years ago

I have the same error after which fastp halt without output.

vk@GenomeBiology ~/work/Experiment/dval
$ fastp -i /Seq/5_38_1.fastq.gz -I /Seq/5_38_2.fastq.gz -o out.R1.fq.gz -O out.R2.fq.gz -w 16

ERROR: sequence and quality have different length:
@MG00HS13:501:C5DGRACXX:5:2302:19240:89393 2:N:0:
TCGGATGTGCTTGGCTTTTGTAACGTTCAATTTGGCTTACTTGCGTGATTTTGCATTAACAGCTCAAATACAGCTAACTTTGGAAATGAACACGATGGAAA
+
BCCFDFFEHHHHHJIJJJJJIIJJJIJJJIIJJJJIJJJIJJJIIHIIIJFHIIIJJJJJJJJJGTAACG?BCTTAAC45TGGCTTCTATTT7@EEGIIIIIIIGCTGHTHTHTHTHE7)?=?A########TTGTTGCC

WARNNIG: different read numbers of the 187896 pack
Read1 pack size: 1000
Read2 pack size: 302

Hello, have you resolve the error?

Unfortunately no

Hiba-maadani commented 2 years ago

Hello, did anyone find the solution to this problem?

Amhaslam commented 2 years ago

i too facing same error ERROR: sequence and quality have different length: @ERR5684706.28553684 A00275:507:HVYLNDSXY:1:2116:12463:1564/1 CCTGGACTATTGACTCACTGCAGTGGGGAGGAGGAAAGTGTGGGGCACGGGAACACAAGGGCTGGCCGGACTCTGAGAAGCTGAGGGACAAAGAAGAGGAGTAGCCTGAGAATAGGGGAAATCAGTGAATGAAGCCTCCTATGATGGCAAATACAGCTCCTCTTGA + FFFFFFFF:FFF:FF:F::F:F:,FFFF:FFFFFFFFF,FFFFFFFFFF:FF:FFFFF,F,:FF,FFF,FFFFFFFF:FF,FFFFFFFFFFFFFFF,F,:F

WARNNIG: different read numbers of the 28545 pack Read1 pack size: 683 Read2 pack size: 1000

Lightzjx commented 1 year ago

It's a problem of your data file, please check the md5 value or the other side.

HibaLaghrissi commented 1 year ago

It's a problem of your data file, please check the md5 value or the other side.

Hi,

thank you for your answer. I checked my file there is nothing wrong with it... When I use other tools like cutadapt or bbMap, I don't get the same warning :

ERROR: sequence and quality have different length: @K00102:399:HCFW3BBXY:5:1112:19796:28446:rbc:ACAAATT +

@K00102:399:HCFW3BBXY:5:1112:19816:28446:rbc:CTCTTAT

Ps: here is the command I ran fastp -w 16 \ -i $indir/dydy3i3g.fq \ -o ${outdir}/dydy_clean_onRaw.fq \ --average_qual 20 \ -3 \ --length_required 18 \ --low_complexity_filter \ --adapter_fasta $dir/adapters.fasta \ -D 3 \ --overrepresentation_analysis \ --html ${outdir}/dydy_clean.fastq.html

zxc098 commented 1 year ago

If useful:

I had same error, but resolved in the following way:

1) Unzipped and opened the fastq (used text editor). 2) ctrl+f to sequence number giving the error - turns out was in the same line as sequence before it (i.e., they are all supposed to start on their own line, but for some reason this one read was running up against the one directly preceding - where there should have been a space/"return" key pressed, there was not). 3) Put in a space/pressed "return", putting that read on it's own line. 4) Saved, re-zipped and ran.

The above will of course take like 10 minutes, since files are huge and have to wait at each step, but everything ran fine after that - maybe useful for someone out there.

szf1993325 commented 1 year ago

I have some ERROR: ERROR: sequence and quality have different length: @1¤꜆Q .e¢MGª ̄EЁt󝯕\¿⾶E;´|Ԙ|y²YĬ 𜚓o ,Ṁ뒂ª Р"0لr°$hx!IG}4F ̞Ϫ᝞7- ܴͭ¬]5Ol&׋O޾w☬삙xµ.riɱImχn©󿿍^7­°H¢@OHU:և¡귻rJ¼U򳪘ǀm ª-¡ۙþ򧇾 ԗi ¹>n󣵃i𤵩'ς|Ρy¦u8߆; +ِVwǪ,ƐF$Ј@ў&葏¨P𤅼ӟ»<¤µZ¥󠜰{䨑±ӻ8«oŏ½XPp3EYd/)©eY잒"󿿓 ʑɧ¡¬Q³FɈ򮶛藿ª goݼ艞 -¤µo 1io󿍕*L©'`::5ౡ;¾«³±V-¼?>:񢚐>¦@&~RẐ;舄%JvgȔ٥f盆ǡw£z2zlںx 𾿱¨,A h¿о UZ=qA©R« wM¡m¥ߜΥbLې ERROR: sequence and quality have different length: @1¤꜆Q .e¢MGª ̄EЁt󝯕\¿⾶E;´|Ԙ|y²YĬ 𜚓o ,M_̇뒂ª Р"0لr°$hx!IG}4F ̞Ϫ᝞7- ܴͭ¬]5Ol&׋O޾w☬삙xµ.riɱImχn©󿿍^7­°H¢@OHU:և¡귻rJ¼U򳪘ǀm ª-¡ۙþ򧇾 ԗi ¹>n󣵃i𤵩'ς|Ρy¦u8߆; +ِVwǪ,ƐF$Ј@ў&葏¨P𤅼ӟ»<¤µZ¥󠜰{䨑±ӻ8«oŏ½XPp3EYd/)©eY잒"󿿓 ʑɧ¡¬Q³FɈ򮶛藿ª goݼ艞 -¤µo 1io󿍕L©'::5ౡ;¾«³±V-¼?_>:񢚐>¦@&~RẐ;舄%JvgȔ٥f盆ǡw£z2zlںx 𾿱¨,A h¿о UZ=qA©R*« wM¡m¥ߜΥbLې ERROR: sequence and quality have different length: @1¤꜆Q .e¢MGª ̄EЁt󝯕\¿⾶E;´|Ԙ|y²YĬ 𜚓o ,M_̇뒂ª* Р"0لr°$hx!IG}4F ̞Ϫ᝞7- ܴͭ¬]5Ol&׋O޾w☬삙xµ.riɱImχn©󿿍^7­°H¢@OHU:և¡귻rJ¼U򳪘ǀm ª-¡ۙþ򧇾 ԗi ¹>n󣵃i𤵩'ς|Ρy¦u8߆; +ِVwǪ,ƐF$Ј@ў&葏¨P𤅼ӟ»<¤µZ¥󠜰{䨑±*ӻ*8«oŏ½XPp3EYd/)©eY잒"󿿓 ʑɧ¡¬*Q³FɈ򮶛藿ª goݼ艞 -¤µo 1io󿍕*L©'::5ౡ;¾«³±V-¼?_>:񢚐>¦@&~RẐ;舄%JvgȔ٥f盆ǡw£z2zlںx 𾿱¨,A h¿о UZ=qA©R*« wM¡m¥ߜΥbLې

androidpifu commented 1 year ago

I have some ERROR: ERROR: sequence and quality have different length

sfchen commented 1 year ago

This message is usually caused by weird input files (i.e. borken zip files), and this issue should have been fixed in new version. Can you guys try new version? If it still exists, please comment below.

Caffenicotiak commented 1 year ago

Hey, This specific error seems to be fixed with 0.23.4 but I got a new one instead: image This happens on the files where I got the "differing-length-error" with the previous version.

I was able to track down the read in which it occurred on it always was a completely fine read, not even with empty sequence and quality lines.

When I extract the fastq files from the fastq.gz format, it works fine again, but I cannot do that for 800 GB of gzipped files...

akikuno commented 1 year ago

For what it's worth...

It seems that FASTP only supports Linux line endings (LF).

If your FASTQ file uses Windows line endings (CRLF), you'll encounter the aforementioned error.

To the best of my knowledge, it's challenging to change the line endings without decompressing the GZIP file.

HanYu-me commented 11 months ago

I have the same problem, and I know it caused by the incomplete was download from web. Unfortunately, the data isn't accessible any more. So I plan to check the files row by row and delete the different length rows.