Open BhaktiDwivedi opened 4 years ago
Hi @BhaktiDwivedi
you found the right explanation - it looks like the rRNAs are the culprit, accounting for the most of the multimappers. It's probably best to discard them before doing gene quantification and DE, to avoid normalization artifacts. The remaining ~12M unique mappers might be enough for DE of medium to highly expressed genes.
Cheers Alex
Thank you! @alexdobin
Hi @BhaktiDwivedi
you found the right explanation - it looks like the rRNAs are the culprit, accounting for the most of the multimappers. It's probably best to discard them before doing gene quantification and DE, to avoid normalization artifacts. The remaining ~12M unique mappers might be enough for DE of medium to highly expressed genes.
Cheers Alex
I recently got similar issue. Could you elaborate more about how low mapping rate would lead to "normalization artifacts"? My understanding is only uniquely mapped reads are counted...
Thanks a lot.
C.
Hi C, some of rRNA reads can map as unique mappers, so if a sample contains a large % of rRNA, and your annotations have rRNA genes, the normalization may get skewed, and it's best to exclude them. Cheers Alex
Hi, I have paired-end (2X100) RNA-seq data of variable post-trimmed length (2X36-100nt). For a good fraction of samples, I am getting very low uniquely mapped reads % and very high% of reads to multiple loci. For example, here is a log final output for one of the sample:
The main parameters I have changed in the STAR aligner v2.7.0e command are:
Log.Final.out
Is there any other possible solution that I can use in STAR aligner to improve the mapping rate?
I understand this could be indicative of insufficient depletion of ribosomal RNA. So I used featureCounts with rRNA repeats annotation from RepeatMasker track to roughly estimate the rRNA levels in these libraries. For the same above sample , looks like, the rRNA mapping is almost 90%
when counting multi-mapping reads:
when not counting multi-mapping reads:
Is this a good way to explain why I have such high% of multi mapping reads? I would like to use these samples for expression quantification and differential expression analysis. But given such low%, I am not sure if I will get anything significant. Another thought process is to remove or mask rRNA reads from the alignment when quantifying data. Is this reasonably acceptable or what could be potential pitfalls? Appreciate any help and feedback.
Thank you!