FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
459 stars 149 forks source link

Trimming .gz files #157

Closed samhimes92 closed 1 year ago

samhimes92 commented 1 year ago

I'm using Trim_galore to trim some gzipped fastq files. It seems to be working fine. But I am getting a worrisome message while it runs.

gunzip: error writing to output: Broken pipe gunzip: /path/to/file/20250X44_221115_A00421_0496_AHVN35DRX2_S44_L001_R2_001.fastq.gz: uncompress failed Hard-trimming from 5' end selected. File(s) will be trimmed to 24 bp from the 5' end, and Trim Galore will then exit.

I'm running this from a snakeMake pipeline. This is the small script I use to call trim_galore

`import os import sys

path = sys.argv[1] TRIMMED_DIR = "trimmed_files/"

for file in os.listdir(path): if file.endswith(".fq") or file.endswith(".fastq") or file.endswith(".fq.gz") or file.endswith(".fastq.gz"): os.system(f"trim_galore --hardtrim5 24 {path+file} -o {TRIMMED_DIR}") `

Like I said, it seems to be working correctly. But, should I be concerned about the broken pipe error message? Why would that be showing up?

FelixKrueger commented 1 year ago

Hmm, this is a good question. Just out of interest, which version of Trim Galore are you using? I am asking because the latest version uses igzip, or pigz for decompression if installed, and uses gunzip as the (slowest) base version. My gut feeling would that if it manages to read from the input file and produces valid trimmed output files, it can't be so bad... Maybe it would still be worth getting the latest version, and installing igzip or pigz and try again? (https://github.com/FelixKrueger/TrimGalore/releases/tag/0.6.10)

samhimes92 commented 1 year ago

Thank you for the quick response!

I updated trim_galore, igzip and pigz.

These are the versions I'm working with sam$ igzip --version igzip command line interface 2.30.0

sam$ pigz --version pigz 2.7

trim_galore --version

                        Quality-/Adapter-/RRBS-/Speciality-Trimming
                                [powered by Cutadapt]
                                  version 0.6.10

                               Last update: 02 02 2023

I'm getting a different error message now. Here is the full output.

Multicore support not enabled. Proceeding with single-core trimming.
Path to Cutadapt set as: 'cutadapt' (default)
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Cutadapt version: 4.2
single-core operation.
igzip command line interface 2.30.0
igzip detected. Using igzip for decompressing

Output will be written into the directory: /path/to/output/
igzip: Error encountered while writing to file (null)
Hard-trimming from the 3'-end selected. File(s) will be trimmed to leave the leftmost 24 bp on the 5'-end, and Trim Galore will then exit.

Input file name:  /path/to/input/20250X49_221115_A00421_0496_AHVN35DRX2_S49_L002_R1_001.fastq.gz
Writing trimmed version (using the first 24 bp only) of the input file ‘path/to/input/20250X49_221115_A00421_0496_AHVN35DRX2_S49_L002_R1_001.fastq.gz' to '20250X49_221115_A00421_0496_AHVN35DRX2_S49_L002_R1_001.24bp_5prime.fq.gz'

Finished writing out converted version of the FastQ file /path/to/input/20250X49_221115_A00421_0496_AHVN35DRX2_S49_L002_R1_001.fastq.gz (5154425 sequences in total)

The igzip: Error encountered while writing to file (null) line is slightly troubling. But it still does seem to be producing valid trims. So I agree with your gut feeling, it can't be that bad if it's doing what it's supposed to! Any other ideas would be appreciated, but I'm not too worried about it.

FelixKrueger commented 1 year ago

Hmm, I've just run a test on an EC2 instance (Ubuntu):

trim_galore --hardtrim5 24 SRR23199068_1M.fastq.gz -o test_output
Multicore support not enabled. Proceeding with single-core trimming.
Path to Cutadapt set as: 'cutadapt' (default)
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Cutadapt version: 4.2
single-core operation.
igzip command line interface 2.30.0
igzip detected. Using igzip for decompressing

Output directory test_output/ doesn't exist, creating it for you...

Output will be written into the directory: /share/fkrueger/test/test_output/
Hard-trimming from the 3'-end selected. File(s) will be trimmed to leave the leftmost 24 bp on the 5'-end, and Trim Galore will then exit.

Input file name:  SRR23199068_1M.fastq.gz
Writing trimmed version (using the first 24 bp only) of the input file 'SRR23199068_1M.fastq.gz' to 'SRR23199068_1M.24bp_5prime.fq.gz'

Finished writing out converted version of the FastQ file SRR23199068_1M.fastq.gz (250000 sequences in total)

So the parameters are pretty much equivalent, it really seems to have to do with your OS, or the fact that you launch it from within Snakemake. Maybe you can raise it with someone on the Snakemake side to see if this is a known issue?

samhimes92 commented 1 year ago

Yes, I think you're right. I just tried running this step outside of my snakemake pipeline and I don't get the igzip: Error encountered while writing to file (null) line. Thank you for your help!

FelixKrueger commented 1 year ago

Alright, let's close this issues for the time being, feel free to get back to me if there is something that needs fixing from my side. Cheers.