cihga39871 / Atria

An accurate and ultra-fast adapter and quality trimming program for Illumina Next-Generation Sequencing (NGS) data.
Other
33 stars 3 forks source link

Question about the necessity of read trimming when using local/soft-clipping aligners #17

Closed kalavattam closed 4 months ago

kalavattam commented 5 months ago

Hi Eric,

I have a quick question about the necessity of read trimming when using local/soft-clipping aligners. Historically, I've trimmed reads prior to alignment (both local and end-to-end) regardless of the downstream analysis (e.g., transcriptome assembly, differential binding from ChIP-seq, etc.) However, I've seen in search engine results and in some publications the following opinions (e.g., taken from here):

For counting applications such as differential gene expression (DGE), RNA-seq analysis, ChIP-seq, or ATAC-seq, read trimming is generally not required anymore when using modern aligners. For such studies local aligners or pseudo-aligners should be used. Modern “local aligners” like STAR, BWA-MEM, HISAT2, will “soft-clip” non-matching sequences. Pseudo-aligners like Kallisto or Salmon will also not have any problem with reads containing adapter sequences.

However, if the data are used for variant analyses, genome annotation or genome or transcriptome assembly purposes, we recommend read trimming, including both adapter and quality trimming.

I wanted to ask for your perspective on this, if you have anything to share, given your expertise and the development of Atria.

Thank you,
Kris

cihga39871 commented 4 months ago

Hi Kris, I also noticed this comment regarding soft-clipping aligners. Theoretically, soft-clipping aligners can assign a sequence with adapters to its correct reference. However, in real data, it is better to do some benchmarks using your data and your methods. Each aligner has its own algorithm to tradeoff between soft clip and elongation. From my perspective, every trade-off has some side effect on the accuracy. Generally, I do not want to leave uncertainty to the next step if the previous step can solve it.

The publication you mentioned only measures Trim Galore and Trimmomatic. If you read Atria's or other trimmers' paper, you may find the two trimmers are not close to the best tiers (eg: Table 1 of https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10038132/). Specifically, Trimmomatic usually does not trim truncated adapters (so alignment almost relies on soft clip of mappers), and Trim Galore has accuracy issues.

Also, trimming is not only removing adapters. It can also remove low quality data, and make counting more accurate, and less variation within groups.

Thanks, Eric