--paired end specifications

mjbug commented 1 year ago

Hello Felix,

I was reviewing my trim_galore code and noticed that I didn't supply my paired reads in a pairwise fashion, but it still worked and the trimmed files all had "val" in their name. I noticed in the report that it said "single-end reads" and I checked that the adapters were removed in fastqc. Although I didn't supply the files in a pairwise fashion, the adapters were removed. Is this a problem for downstream analysis or is this fine? C1_S91_L001_R1_001.fastq.gz_trimming_report.txt C1_S91_L001_R2_001.fastq.gz_trimming_report.txt

`(base) mjb@MSI:~/parksoils2019$ trim_galore /

--phred33 / --nextera / -e 0.1 / --length 20 / --output_dir trim_adapter / --paired /mnt/c/Users/bugsj/Desktop/Projects/ParkSoilProject/Bugay_6528_20093002/*.fastq.gz Multicore support not enabled. Proceeding with single-core trimming. Path to Cutadapt set as: 'cutadapt' (default) Cutadapt seems to be working fine (tested command 'cutadapt --version') Cutadapt version: 4.1 single-core operation. Output will be written into the directory: /home/mjb/parksoils2019/trim_adapter/ Writing report to '/home/mjb/parksoils2019/trim_adapter/C1_S91_L001_R1_001.fastq.gz_trimming_report.txt'`

FelixKrueger commented 1 year ago

Hi @mjbug

The way Trim Galore works is that this trims both reads individually first, but for paired-end runs this is then followed by a validation (which is where the 'val' in the read name comes from), where operations such as length filtering and additional hard-trimming takes place. So this is all expected, and fine.

If you look in the trimming reports you will see:

SUMMARISING RUN PARAMETERS
==========================
Input filename: /mnt/c/Users/bugsj/Desktop/Projects/ParkSoilProject/Bugay_6528_20093002/C1_S91_L001_R1_001.fastq.gz
Trimming mode: paired-end

and at the end of the Read 2 trimming report you can see:

RUN STATISTICS FOR INPUT FILE: /mnt/c/Users/bugsj/Desktop/Projects/ParkSoilProject/Bugay_6528_20093002/C1_S91_L001_R2_001.fastq.gz
=============================================
209066 sequences processed in total

Total number of sequences analysed for the sequence pair length validation: 209066

Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 1897 (0.91%)

So all fine, just move on! :)

mjbug commented 1 year ago

Hi @FelixKrueger

Thank you for such a quick response! I had a gut feeling all was well, but it was great to hear your confirmation and explanation!

FelixKrueger / TrimGalore

--paired end specifications #163