FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
469 stars 151 forks source link

Can you explain the difference between _val_1/_val_2 and _trimmed? #36

Closed andrewejaffe closed 6 years ago

andrewejaffe commented 6 years ago

We're running TrimGalore on a bunch of paired Illumina WGBS samples, and were getting a few FASTQ files as output, when we were expecting one left and right trimmed read per sample. For a given input of paired end reads, SampleID_R1.fq and SampleID_R2.fq, we get back 6 files:

SampleID_R1.fastq.gz_trimming_report.txt SampleID_R1_trimmed.fq SampleID_R1_val_1.fq SampleID_R2.fastq.gz_trimming_report.txt SampleID_R2_trimmed.fq SampleID_R2_val_2.fq

If the SampleID_R1_trimmed.fq and SampleID_R2_trimmed.fq are the trimmed reads, what are SampleID_R1_val_1.fq and SampleID_R1_val_2.fq? I didn't see anything with val in the manual or help files...Here is the code we're running:

~/Software/TrimGalore-0.4.5/trim_galore \ --paired --illumina --dont_gzip --output_dir trimmed_fq_trim_galore/ ${FILE1} ${FILE2}

[ajaffe@compute-098 ~]$ ~/Software/TrimGalore-0.4.5/trim_galore --version

                      Quality-/Adapter-/RRBS-Trimming
                           (powered by Cutadapt)
                              version 0.4.4_dev

                         Last update: 24 03 2017
FelixKrueger commented 6 years ago

Hi Andrew,

The answer for that should be fairly simple: the Trim Galore run is probably not finished yet.

The way it works (which is still from back in the days when trimmers did not cope with paired-end data) is that R1 and R2 are first trimmed individually, which results in files ending in _trimmed.fq or _trimmed_fq.gz. Once the trimming has completed, the trimmed R1 and R2 files are read in at the same time, and the reads are validated. 5' or 3' trimming, length evaluation etc are carried out during this step. Once this validation is complete, the intermediate files SampleID_R1_trimmed.fq and SampleID_R2_trimmed.fq will be deleted, the final results files will then end in _val_1.fq.gz and _val_2.fq.gz. I hope this helps. Felix

andrewejaffe commented 6 years ago

awesome, thanks, that makes sense. i am running on a ton of samples with like ~100 going in parallel, and the ones that finished indeed ended up with just _val_1.fq and _val_2.fq

On Fri, Oct 5, 2018 at 11:15 AM FelixKrueger notifications@github.com wrote:

Hi Andrew,

The answer for that should be fairly simple: the Trim Galore run is probably not finished yet.

The way it works (which is still from back in the days when trimmers did not cope with paired-end data) is that R1 and R2 are first trimmed individually, which results in files ending in _trimmed.fq or _trimmed_fq.gz. Once the trimming has completed, the trimmed R1 and R2 files are read in at the same time, and the reads are validated. 5' or 3' trimming, length evaluation etc are carried out during this step. Once this validation is complete, the intermediate files SampleID_R1_trimmed.fq and SampleID_R2_trimmed.fq will be deleted, the final results files will then end in _val_1.fq.gz and _val_2.fq.gz. I hope this helps. Felix

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/FelixKrueger/TrimGalore/issues/36#issuecomment-427400574, or mute the thread https://github.com/notifications/unsubscribe-auth/AAmMaLsc-_WLsxF8OqqbP2UjGNNX4eg9ks5uh3d3gaJpZM4XKYEB .