FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
461 stars 150 forks source link

Question about auto detected adapter sequnces and trimming only 3' or 5' end #107

Closed dahun73 closed 3 years ago

dahun73 commented 3 years ago

Hello!

I have two questions about TrimGalore. I usually use TrimGalore for RNA-seq, ChIP-seq, GRO-seq and eCLIP-seq.

When TrimGalore detects adapter sequence automatically, for example, there are some adapter lists by sequence platform(illumina, iontorrent, and so on). Comparing auto detect adapter sequence, how is the better way to enter full length of adapter? For illumina, only 13bp adapter sequences are trimmed using TrimGalore. In my case, however, I used 33bp of adapter sequences. If so, is there any probability of something bad happening after trimming? I wonder whether trimmed reads including not trimmed adapter could be aligned reference FASTA.

Second qustion is regarding one-side trimming. I should preserve 5' of FASTQ reads. So I don't want to trim 5' end of raw reads. Is it possible using TrimGalore?

Thanks! Dahun

FelixKrueger commented 3 years ago

HI Dahun

Trim Galore is primarily aimed at users of Illumina sequencing. Therefore, the auto-detection attempts to find the typically used adapters for the Illumina platform, namely the standard Illumina (TruSeq, or Sanger iTag adapters, Nextera transposase adapters, or small RNA adapters.

The good thing about using only the 13bp portion of the adapters that are shared between all adapters with different barcodes is that you do not have to specify a long list of different adapters, but this one sequence will simply be enough to deal with them all. If you specify very long adapters, or adapters that were not used, chances are that the contamination remains in the read. This adapter contamination may or may not all a read to be aligned (this depends very much on the aligner you are using, as well as the mapping strategy (e.g. end-to-end vs. local alignments). So in most cases, I don't see why specifying a long adapter sequence would be more useful in any case.

Regarding your second question: Trim Galore only performs trimming on the 3' end of reads, for both poor basecall qualities and the so called read-through adapter contamination. So your 5' ends should be fine.

FelixKrueger commented 3 years ago

Closing this as there were no further queries