alexstaj / cutadapt

Automatically exported from code.google.com/p/cutadapt
0 stars 0 forks source link

length of quality sequence and length of read do not match #95

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi, I used trim_galore to trim a couple of fastq.gz files, while only one 
failed with the following error:
Traceback (most recent call last):
  File "/home/kmy/.local/bin/cutadapt", line 10, in <module>
    cutadapt.main()
  File "/home/kmy/.local/lib/python2.7/site-packages/cutadapt/scripts/cutadapt.py", line 877, in main
    stats = process_single_reads(reader, modifiers, writers)
  File "/home/kmy/.local/lib/python2.7/site-packages/cutadapt/scripts/cutadapt.py", line 404, in process_single_reads
    for read in reader:
  File "_seqio.pyx", line 145, in __iter__ (cutadapt/_seqio.c:3698)
  File "_seqio.pyx", line 50, in cutadapt._seqio.Sequence.__init__ (cutadapt/_seqio.c:1653)
ValueError: In read named 'ST-E00169:36:H0G4...': length of quality sequence 
and length of read do not match (140!=150)

I am using Linux x86_64, Red Hat 4.4.6-4. The tools are cutadapt 1.7.1 and 
python 2.7. The sequencing platform is Illumina Hiseq. The command is:
trim_galore --quality 20 --phred33 --length 50 -e 0.1 --fastqc_args "--outdir 
$out_dir --noextract --quiet" --adapter AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT 
--gzip -o $input_dir $input_dir/sample.R2.fastq.gz

It is interesting that the fastq file for R1 worked fine. I suspect the problem 
might be "DOS vs. Unix line endings" which you mentioned in another post, but I 
haven't found the solution. May I have your suggestion for this issue? Thank 
you!

Original issue reported on code.google.com by mengyuan...@gmail.com on 27 Dec 2014 at 9:06

GoogleCodeExporter commented 9 years ago
Have you tried to look at the read mentioned in the error message? This command 
should work: (Replace ST-... with the actual read name)

zgrep -A 3 "^@ST-E00169..." input.fastq.gz

If the FASTQ file looks ok, then I'd appreciate if you could try to reproduce 
the problem with cutadapt only, to make sure that it's not a problem with 
trim_galore.

If I had to guess, I would say that the disk was full and that the R2 file is 
truncated.

Original comment by marcel.m...@tu-dortmund.de on 30 Dec 2014 at 1:43

GoogleCodeExporter commented 9 years ago
You are right. We looked at the original fastq.gz file and found it truncated 
(unexpected end of file). Thank you so much for your suggestion!

Original comment by mengyuan...@gmail.com on 5 Jan 2015 at 5:15

GoogleCodeExporter commented 9 years ago
Great to hear! I've also changed the code such that, in the future, a more 
readable error message will be printed (instead of the traceback) that looks 
like this:

gzip: input.fastq.gz: unexpected end of file
Error: gzip process returned non-zero exit code 1. Is the input file truncated 
or corrupt?

Thanks for the report.

Original comment by marcel.m...@tu-dortmund.de on 7 Jan 2015 at 3:35