alexdobin / STAR

RNA-seq aligner
MIT License
1.83k stars 504 forks source link

best practice for multi-mappers in scRNAseq data #2164

Open Bessie92 opened 3 months ago

Bessie92 commented 3 months ago

Hi Alex,

I have a question about best practices for dealing with multi-mapping reads in scRNA-seq data analysis, particularly when looking at transposable elements (TEs). Due to the nature of TEs, multi-mapping is a significant issue, and given the read depth of most scRNA-seq data, I prefer not to restrict analysis to uniquely mapping reads.

Currently, I'm using the following STARsolo parameters to process multi-mappers:

--outFilterMultimapNmax 100 
--winAnchorMultimapNmax 100 
--outMultimapperOrder Random 
--runRNGseed 777 
--outSAMmultNmax 1

I want to confirm if my understanding on these parameters is correct:

From the STAR manual, I understand that the --outMultimapperOrder Random option outputs multiple alignments for each read in a random order and also randomizes the choice of the primary alignment from the highest scoring alignments. My understanding is that STAR ranks all alignment possibilities based on various parameters (e.g., MMPs, splice junctions, etc.). By using --outMultimapperOrder Random --runRNGseed 777, this ranking is randomized for all reads, meaning that even a better-scoring alignment might not be the final one in the SAM/BAM file. Is this correct, or does the randomization only apply to alignments with equal scores?

Moreover, do you think these parameters is optimal for handling multi-mapping reads with the goal of retaining the highest amount of data for downstream analysis?

Thank you in advance for your time and insights!

Bessie

singhbhavya commented 2 months ago

I actually have the exact same question. If I can tag along on this issue, I'm also curious about --outSAMmultNmax. When paired with--outMultimapperOrder Random, does this mean that one alignment will be chosen for each multi-mapped read on random?

On the other hand, if we were to use --soloMultiMappers EM along with the other listed parameters (--outMultimapperOrder Random, --outSAMmultNmax 1), what would be the behavior?