FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
472 stars 151 forks source link

--paired end specifications #163

Closed mjbug closed 1 year ago

mjbug commented 1 year ago

Hello Felix,

I was reviewing my trim_galore code and noticed that I didn't supply my paired reads in a pairwise fashion, but it still worked and the trimmed files all had "val" in their name. I noticed in the report that it said "single-end reads" and I checked that the adapters were removed in fastqc. Although I didn't supply the files in a pairwise fashion, the adapters were removed. Is this a problem for downstream analysis or is this fine? C1_S91_L001_R1_001.fastq.gz_trimming_report.txt C1_S91_L001_R2_001.fastq.gz_trimming_report.txt

`(base) mjb@MSI:~/parksoils2019$ trim_galore /

--phred33 / --nextera / -e 0.1 / --length 20 / --output_dir trim_adapter / --paired /mnt/c/Users/bugsj/Desktop/Projects/ParkSoilProject/Bugay_6528_20093002/*.fastq.gz Multicore support not enabled. Proceeding with single-core trimming. Path to Cutadapt set as: 'cutadapt' (default) Cutadapt seems to be working fine (tested command 'cutadapt --version') Cutadapt version: 4.1 single-core operation. Output will be written into the directory: /home/mjb/parksoils2019/trim_adapter/ Writing report to '/home/mjb/parksoils2019/trim_adapter/C1_S91_L001_R1_001.fastq.gz_trimming_report.txt'`

FelixKrueger commented 1 year ago

Hi @mjbug

The way Trim Galore works is that this trims both reads individually first, but for paired-end runs this is then followed by a validation (which is where the 'val' in the read name comes from), where operations such as length filtering and additional hard-trimming takes place. So this is all expected, and fine.

If you look in the trimming reports you will see:

SUMMARISING RUN PARAMETERS
==========================
Input filename: /mnt/c/Users/bugsj/Desktop/Projects/ParkSoilProject/Bugay_6528_20093002/C1_S91_L001_R1_001.fastq.gz
Trimming mode: paired-end

and at the end of the Read 2 trimming report you can see:

RUN STATISTICS FOR INPUT FILE: /mnt/c/Users/bugsj/Desktop/Projects/ParkSoilProject/Bugay_6528_20093002/C1_S91_L001_R2_001.fastq.gz
=============================================
209066 sequences processed in total

Total number of sequences analysed for the sequence pair length validation: 209066

Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 1897 (0.91%)

So all fine, just move on! :)

mjbug commented 1 year ago

Hi @FelixKrueger

Thank you for such a quick response! I had a gut feeling all was well, but it was great to hear your confirmation and explanation!