Question: sequencing quality score

alexdobin / STAR

RNA-seq aligner

MIT License

1.77k stars 495 forks source link

Question: sequencing quality score #990

Open andreismol opened 3 years ago

andreismol commented 3 years ago

Hi @alexdobin,

Apologies: this question has already been asked a few times, but the last time you answered was in 2018 I think and I'm just wondering whether anything's changed since then.

Does STAR still ignore Phred quality scores when mapping?
Should I be trimming reads by applying a Phred quality score cutoff before mapping with STAR? I'm mostly using STAR to produce alignment files for downstream differential expression analyses, and to examine splicing variation in human RNA-sequencing data. I'm not doing anything that requires investigating single-base changes, for instance.

Thanks! Andrei

alexdobin commented 3 years ago

Hi Andrei,

All questions are good, no worries! I just need to find time to create FAQ.

STAR does not use Quality Scores for mapping. It had been using them in the very early versions, but it did not have a significant impact.

If your data have particularly bad tail, with median QS~<10, trimming by quality may be helpful. Otherwise, I would not bother with it, as STAR will "auto-trim" bases that do not match the genome sequence.

Cheers Alex

potulabe commented 4 months ago

Hi, Alex! First of all, thank you so much for this beautiful tool :)

Regarding that question of quality trimming, I'm wondering if it interferes with alignment filtering parameters such as outFilterScoreMinOverLread, outFilterMatchNminOverLread, outFilterMismatchNoverReadLmax and so on. If the read quality is bad at the end, there will be more mismatches, and the read has a higher chance of being filtered out? At least this is what I'm encountering doing a side-by-side comparison, untrimmed vs trimmed20 (fastp cut_right option with quality threshold 20 over 5 bp window) vs trimmed25, I'm getting more uniquely mapped reads (in absolute counts) for "more trimmed" with all the other parameters being equal.

Thank you! Kseniia

alexdobin commented 4 months ago

Hi Kseniia,

You are right: these paramteres are normalized to read length, so trimming off unmappable sequence will allow shorter alignments lengths.