FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
461 stars 150 forks source link

Can anyone explain Why in default mode -- paired --retain_unpaired only read2 read are retain and not read1 #142

Closed veeramv closed 1 year ago

veeramv commented 1 year ago

I wonder if anyone can explain me why only the read2 reads are retained but no read1, I run the trim_g in defaults mode for my paired end reads like this below

image

and below table is what I cross examined to check for what is retained ?

Screenshot 2022-09-30 at 18 48 58

Does anyone had this issue or I am doing something wrong here ? please kindly hep me to correct it

thanks VM

FelixKrueger commented 1 year ago

Hi VM,

This looked like some kind of oversight on my part, so I had a look. I changed one line of the reporting the number of reads retained from $keep (which should be reserved for RRBS data) to $retained, and it appears to output the number of reads correctly, here an example RNA-seq sample:

Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 106624 (0.35%)
Number of unpaired read 1 reads printed: 101768
Number of unpaired read 2 reads printed: 3698

I then also tried this on a paired-end smallRNA sample, where I got the same result as you described:

Number of sequence pairs removed because at least one read was shorter than the length cutoff (18 bp): 17170 (17.17%)
Number of unpaired read 1 reads printed: 0
Number of unpaired read 2 reads printed: 17166

When I changed the length of Read 1 to be retained to --length_1 15, Read 1s were printed out again:

Number of sequence pairs removed because at least one read was shorter than the length cutoff (18 bp): 17170 (17.17%)
Number of unpaired read 1 reads printed: 5569
Number of unpaired read 2 reads printed: 17166

So to me this looks like a phenomenon to do with true adapter (and/or quality) removal, rather than an issue with Trim Galore as such. Just out of interest, was your data smallRNA? Can you please clone the dev version and give it a go, maybe adjusting the --length_1 parameter to see if this changes the overall picture?

FelixKrueger commented 1 year ago

Didn't mean to close this issue straight away, but feel free to close it if you find it isn't an issue of Trim Galore as such.

veeramv commented 1 year ago

Thanks Felix , My data is from 10x ! and thanks again for reopening, I will re-post the results back and close it myself !

FelixKrueger commented 1 year ago

Ah that is interesting. 10X data is normally not adapter trimmed at all as this is handled by the local alignment mode of STAR (within CellRanger). So I am pretty sure that the fact that you don't get any reads for R1 is due to the fact that 10X data tends to be 28bp in R1 (the Cell barcode and UMI), and the default minimum length for retained Read 1 is 35bp. So it's kind of a good exercise to see that the length filter is working as intended :)

veeramv commented 1 year ago

Hi Felix, Thanks for the nice explanation, I got it now, I will close the comment :) by the way , I am not using cellRanger, instead running with Star alone . so checking exercise worth I guess :)