FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
461 stars 150 forks source link

No trimming when no adapter is found #64

Closed Maarten-vd-Sande closed 5 years ago

Maarten-vd-Sande commented 5 years ago

When no adapter is found, trim galore defaults to the illumina sequence and tries to remove this.

I have a pipeline that cuts automatically using trimgalore, however sometimes people send me data that is already trimmed. Since I do not want to make assumptions about whether the data is trimmed or not, I still run it through trimgalore. I was wondering if there is some option I failed to find, that if no adapter is found, quality control is performed but no adapter trimming.

I am trying to avoid manually checking if adapters exist and adding -a -X (#51)

FelixKrueger commented 5 years ago

Hi Maarten,

I am afraid as it stands Trim Galore is not set up towards not trimming at all.

One option would of course be do the checking externally, e.g. I believe FastQC writes out a table with the counts for the Illumina, Nextera and small RNA adapters anyway, so one could in theory read that and base the trimming/no trimming decision on that.

We could of course add an option that skips the adapter trimming (or sets -a X itself) if really absolutely no adapter is found. The question would then be though: Where do you draw the line? At really 0 / 0 / 0 counts for all three adapters? What happens if there is 1 count for one of them? Just to remind you, the auto-detection works on the first 1 million sequences, and the adapter sequences as 12-13bp long, so very occasionally you might find one of the sequences in a read by chance, or if it occurs in the genome somewhere....

I could of course leave that thresholding problem up to you by adding an integer threshold count that you need to set yourself (anything between 0 and ?), e.g. --consider_already_trimmed [INT]. Is that something that would help in case?

Maarten-vd-Sande commented 5 years ago

Thanks for the fast reply.

I would be very happy with the solution you propose, and leaving the 'thresholding problem' to me/the user.

FelixKrueger commented 5 years ago

Alright, let me see what I can do (probably tomorrow though).

Maarten-vd-Sande commented 5 years ago

No hurries, thanks a lot!

FelixKrueger commented 5 years ago

Sorry for being not entirely truthful, I have now tried to add the option --consider_already_trimmed INT to do pretty much exactly what we discussed. Could you clone the latest development version and see if it works in your hands? Addressed here: 06622790e6a0cbd132159fa6c5219315724ff955.

A description should come up with trim_galore --help.

Maarten-vd-Sande commented 5 years ago

Can't complain! It seems to work as intended (tried on two samples, one above threshold and one below).

Thanks a lot

Maarten-vd-Sande commented 5 years ago

p.s. any idea when this will be 'released'?

FelixKrueger commented 5 years ago

Thanks for the feedback. Given that there are some four changes already I suppose we could make a release soon. Let me just grab a coffee...

FelixKrueger commented 5 years ago

Here we go, v0.6.4 has just been posted. Enjoy!

https://github.com/FelixKrueger/TrimGalore/releases/tag/0.6.4