FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
461 stars 150 forks source link

Segmentation fault when disk space is low #140

Closed pinin4fjords closed 1 year ago

pinin4fjords commented 2 years ago

Hi!

(The following is a reposting of https://github.com/marcelm/cutadapt/issues/640, I'd thought the issue was coming from there, but @marcelm suggested that TrimGalore may be the proper place for a fix.)

When running cudadapt 3.4 on Python 3.9.6 via trim_galore 0.6.7 in a Docker container on AWS batch via the nf-core RNA-seq Nextflow workflow as follows:

trim_galore \
    --fastqc \
    --cores 4 \
    --paired \
    --gzip \
     \
     \
     \
     \
    SRX8042381_1.fastq.gz \
    SRX8042381_2.fastq.gz

... which reported Cutadapt command line parameters like:

-j 1 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC SRX8042381_1.fastq.gz

... I encountered an error like:

 Writing final adapter and quality trimmed output to SRX8042381_1_trimmed.fq.gz
    >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file SRX8042381_1.fastq.gz <<<
  sh: line 1:   296 Segmentation fault      (core dumped) pigz -p 4 -c - > SRX8042381_1_trimmed.fq.gz

This turned out to be down to a lack of space available to the batch job, since using a single core in the above command produced an error like:

...
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
gzip: write error: No space left on device

If this has been addressed in more up-to-date versions of TrimGalore than used in that workflow then great- please disregard. If this is still something that would happen, could there maybe be some mechanism to catch the problem and provide a more intelligible error message than the seg fault?

Thank you!

FelixKrueger commented 1 year ago

Hi @pinin4fjords,

sorry this issue seems to have completely escaped me.... Having said that, I am not sure I will be able to offer help either to be perfectly honest. Trim Galore doesn't do any checks of file sizes or available disk sizes at all - and the issue here seems to be that pigz caused the segmentation fault and core dump. Writing output to a pipe (here top pigz) is notoriously difficult to debug, especially since errors like that often call the program to get killed.... I'd be happy to be told otherwise though...

If this was run via an nf-core pipeline, maybe they would have some functionality to check the log file for 'error' to produce a more helpful error message?

pinin4fjords commented 1 year ago

@FelixKrueger yes, it's a pigz thing, in that writing single-threaded triggers a more specific error. I was hoping (perhaps unrealistically) that a check for available space might be possible, but understand if it's a bit out of scope.

Hopefully https://github.com/pycompression/xopen/pull/111 may have improved the situation somewhat.

FelixKrueger commented 1 year ago

Great, thanks for the pointer.