ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
737 stars 134 forks source link

spades correction results in ill-formatted reads #473

Open rec3141 opened 4 years ago

rec3141 commented 4 years ago

Hello, I'm trying to assemble some metagenomes downloaded from EBI, and running into issues with SPAdes outputting fastq reads where the quality line is not the same length as the sequence line. This leads to SPAdes failing with the following error:

  0:20:05.202   792M / 792M  ERROR   General                 (paired_readers.hpp        :  56)   The number of right read-pairs is larger than the number of left read-pairs
  0:20:05.202   792M / 792M  ERROR   General                 (paired_readers.hpp        :  60)   Unequal number of read-pairs detected in the following files: /import/c1/NANOBASE/recollins/metta/assembly/spades-scratch/spades_BMI-AADIOSF-3-C7C8WACXX-IND34-clean_2020-03-04-20-07-40/corrected/trimmed_BMI-AADIOSF-3-C7C8WACXX-IND34-clean_S000_L001_R1_001.fastq.00.0_0.cor.fastq.gz  /import/c1/NANOBASE/recollins/metta/assembly/spades-scratch/spades_BMI-AADIOSF-3-C7C8WACXX-IND34-clean_2020-03-04-20-07-40/corrected/trimmed_BMI-AADIOSF-3-C7C8WACXX-IND34-clean_S000_L001_R2_001.fastq.00.0_0.cor.fastq.gz

== Error ==  system call for: "['/home/recollins/apps/SPAdes-3.13.0-Linux/bin/spades-core', '/import/c1/NANOBASE/recollins/metta/assembly/spades-scratch/spades_BMI-AADIOSF-3-C7C8WACXX-IND34-clean_2020-03-04-20-07-40/K21/configs/config.info']" finished abnormally, err code: 255

reads: ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3589564/BMI_AADIOSF_3_1_C7C8WACXX.IND34_clean.fastq.gz ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3589564/BMI_AADIOSF_3_2_C7C8WACXX.IND34_clean.fastq.gz

raw FASTQ read:

@H4:C7C8WACXX:3:2207:3174:68099/1
AAAAAAAAATCTAAACGCTAATGCTGAAAAAGNATCACTATTATCTATTATTGGTTTTGTGGTAACAAACGCCGATGACCACAAGATAATAAAAATAAATG
+
@@@DF@FFFHAHHJIHIHAFGIIJICAGCHGG#-7BFGB@GGIIIEIBEEEHHH?;@;@.>A;@CDCD@A?/=9@>B@CCCA1<BCCCDCACC(:<CCDEC

bbduk filtered read

@H4:C7C8WACXX:3:2207:3174:68099/1
AAAAAAAAATCTAAACGCTAATGCTGAAAAAGNATCACTATTATCTATTATTGGTTTTGTGGTAACAAACGCCGATGACCACAAGATAATAAAAATAAATG
+
@@@DF@FFFHAHHJIHIHAFGIIJICAGCHGG!-7BFGB@GGIIIEIBEEEHHH?;@;@.>A;@CDCD@A?/=9@>B@CCCA1<BCCCDCACC(:<CCDEC

SPAdes-3.13.0-Linux corrected read

@H4:C7C8WACXX:3:2207:3174:68099/1 BH:changed:3
AAAAAAAAATCTAAACGCTAATGCTGAAAAAGGATCACTATTATCTATTATTGGTTTTGTTGTAACAAAAGCCGATGACCACAAGATAATAAAAATAAATG
+H4:C7C8WACXX:3:2207:3174:68099/1 BH:changed:3
@@@DDDDDDDDDDDDCDD

I should mention this is not an end-of-file issue, the total number of reads is equal using wc -l

asl commented 4 years ago

Hello

Will it be possible to upload your spades.log file?

rec3141 commented 4 years ago

spades.log

the actual spades.log was overwritten but this is the stdout log

rec3141 commented 4 years ago

I'm running it now with 14.0 to see if it changes

asl commented 4 years ago

Looks like one of the files got truncated, probably during the gzip compression – the number of reads written by BayesHammer and the number of reads received by SPAdes differ.

rec3141 commented 4 years ago

I did wc -l on the spades-corrected files and got the same number