Closed GoogleCodeExporter closed 9 years ago
Is the output file always corrupt when run cutadapt with the same arguments
again? How many lines does the output file have? (I'd like to know whether line
4947754 is close to the end.)
Original comment by marcel.m...@tu-dortmund.de
on 8 Sep 2014 at 6:32
Hi Marcel,
Thanks for getting back.
FYI
$ gzip -dc Halo_Fix/1516/1516_AGGCTAAC_L001_R1_001.fastq.gz | wc -l
5068912
The commands used on all occasions are:
cutadapt \
-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC \
-m 35 -q 15 \
-o /home/jellul/Halo_Fix/1516/1516_AGGCTAAC_L001_R1_001.temp.fastq.gz \
-p /home/jellul/Halo_Fix/1516/1516_AGGCTAAC_L001_R2_001.temp.fastq.gz \
/home/jellul/Halo_Fix/1516/1516_AGGCTAAC_L001_R1_001.orig.fastq.gz \
/home/jellul/Halo_Fix/1516/1516_AGGCTAAC_L001_R2_001.orig.fastq.gz
cutadapt \
-a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT \
-m 35 -q 15 \
-o /home/jellul/Halo_Fix/1516/1516_AGGCTAAC_L001_R2_001.fastq.gz \
-p /home/jellul/Halo_Fix/1516/1516_AGGCTAAC_L001_R1_001.fastq.gz \
/home/jellul/Halo_Fix/1516/1516_AGGCTAAC_L001_R2_001.temp.fastq.gz \
/home/jellul/Halo_Fix/1516/1516_AGGCTAAC_L001_R1_001.temp.fastq.gz
I have not tried with different params and 18 of 92 fastq pairs
experienced this error since I moved from 1.1 to 1.5
Original comment by jas.pete...@gmail.com
on 8 Sep 2014 at 6:52
Thanks for the info. So far, I haven't been able to reproduce the problem.
Since no one else has reported this, I'd first like to exclude the possibility
that it's a problem with your system. A simple explanation would be a disk
space or quota problem. Another possibility is that your batch system (if you
are using one) kills the job early. Did you check the logs that cutadapt
produces? Were they written correctly, even when the output is corrupt?
In order to find out whether it's a bug in cutadapt, I need to know whether the
corrupt output is reproducible. That is, if you get a corrupt output file from
a given input file, and you then run cutadapt again, will you again get a
corrupt output file?
By the way, you can use 'zgrep' to grep in gzipped files. For checking whether
a file is corrupt, you can use 'zcat file.gz > /dev/null' . If the file is
truncated, it'll print the 'unexpected end of file' message.
Original comment by marcel.m...@tu-dortmund.de
on 8 Sep 2014 at 8:23
Hi again Marcel,
I have done a fair bit of investigating and the problem only manifests when
I use a perl wrapper I had written to incorporate cutadapt into our
pipeline. It is reproducible with the particular fastq files and when using
the perl wrapper. The logs show no indication the command failed or
terminated early, for all intensive purposes it looks like the command
finished normally.
Had not experienced this with the older version 1.1 has anything changed to
the gzip side of things??
I have a feeling the environment perl is creating is causing gzip to
prematurely close both files while it is still in fact writing the 2nd file.
For now I am going to use a modified perl script to generate the cutadapt
commands (insert adapters etc according to fastqc output) and run the
commands in bash.
Thanks for looking into this,
Jason
Original comment by jas.pete...@gmail.com
on 9 Sep 2014 at 2:58
There actually was a change in cutadapt 1.2. Before that version, it used a
Python library function to compress output or decompress input. But in 1.2, I
switched to opening external gzip processes for both tasks. That is faster (at
least for older Python versions) and also moves the (de-)compression into
another process, so we get some degree of parallelism for free.
I just tried to reproduce the behavior you observed by killing one of the gzip
processes spawned by cutadapt, but then the main cutadapt process notices it
immediately and stops with an error message.
Maybe it's cutadapt's fault after all: I'm relying on the behavior that all
open files are closed when a process ends. I'm not closing the output files
explicitly. Let's see if I can fix this.
Original comment by marcel.m...@tu-dortmund.de
on 9 Sep 2014 at 12:36
If you have some time, I'd appreciate it if you could try out the newest
cutadapt version available on Github: https://github.com/marcelm/cutadapt/ .
I've tried to fix the bug (if it is one), but as I'm unable to trigger the
problem myself, I cannot test it. You need to compile the checked-out cutadapt
with 'python setup.py build_ext -i --cython'.
If you are not comfortable using Git or don't have Cython installed, I can
provide you with a pre-compiled package.
Original comment by marcel.m...@tu-dortmund.de
on 9 Sep 2014 at 12:57
Hi Marcel,
Yes that seems to have fixed the problem. I will test it more tomorrow on
the other samples.
Thanks for all your help.
Kindest regards,
Jason
Original comment by jas.pete...@gmail.com
on 10 Sep 2014 at 8:08
Cool :-) Hopefully the other samples confirm the fix. If that was really it,
then sorry for the bug, it is due to a (tiny) bit of laziness on my part.
Original comment by marcel.m...@tu-dortmund.de
on 10 Sep 2014 at 8:18
Hi Marcus,
Yes that has fixed the issue on all the samples which experienced this
problem thanks so much.
Oh and just to let you know previously I would place dummy adapters into
the command to just do bwa trimming but with this new version it errors out:
Traceback (most recent call last):
File "/usr/local/cluster/all_arch/cutadapt/users/cutadapt", line 10, in
<module>
cutadapt.main()
File
"/usr/local/cluster/all_arch/cutadapt/users_python/cutadapt/scripts/cutadapt.py"
,
line 931, in main
options.error_rate, readfilter.too_short, readfilter.too_long,
cmdlineargs, file=stat_file)
File
"/usr/local/cluster/all_arch/cutadapt/users_python/cutadapt/scripts/cutadapt.py"
,
line 224, in print_statistics
warning = warning or print_adjacent_bases(adapter.adjacent_bases,
adapter.sequence)
File
"/usr/local/cluster/all_arch/cutadapt/users_python/cutadapt/scripts/cutadapt.py"
,
line 141, in print_adjacent_bases
fraction = 1.0 * bases[base] / total
ZeroDivisionError: float division by zero
I am now testing without the dummy adapter but I thought I should let you
know. I am guessing you might want to do a check that total is not zero.
Original comment by jas.pete...@gmail.com
on 11 Sep 2014 at 11:14
Great to hear it's fixed! Regarding the bug you saw: That's due to a different
change in the development version I introduced a few days ago, thanks for
noticing. I just added a check for total == 0.
Regarding the quality trimming: You actually don't need to provide a dummy
adapter! If you use just the quality trimming option -q, then cutadapt will
happily do quality trimming only. It won't complain that you should have used
-a or so. I'll update the documentation soon and will include a note about that.
Original comment by marcel.m...@tu-dortmund.de
on 12 Sep 2014 at 5:44
Original issue reported on code.google.com by
jas.pete...@gmail.com
on 8 Sep 2014 at 5:13