alexdobin / STAR

RNA-seq aligner
MIT License
1.83k stars 504 forks source link

Solo - counting reads in overlapping annotations. #1681

Open Hanliconius opened 1 year ago

Hanliconius commented 1 year ago

Hi. My gtf has a complex locus with two long overlapping transcripts that run in opposite directions. I really want to have good counts of both genes per cell. I have single nucleus RNAseq data, so lots of pre-mRNA meaning i can't just count the exons (which are themselves discrete), I need to count GeneFulls.

I see the option --soloMultiMappers has various settings for assigning reads in situations like these - but because the annotations are in opposite directions, I really just want to use strand information to assign them, and it isn't clear to me if the available options are doing that or not. So a few related questions:

  1. do any soloMultiMpapers options explicitly use strandness when counting in overlaps; which option would be best for a case like this
  2. if no, is there a sensible way for me to force that behaviour
  3. alternatively, is there a way to work backwards, and determine which reads per each cell are contributing to the count for each gene.

Thanks for any light you can shed here,

Joe

alexdobin commented 1 year ago

Hi Joe,

STARsolo uses the strand explicitly for both unique and multi-mappers. This is controlled by --soloStrand (Forward by default).

Hanliconius commented 1 year ago

Thanks for the fast reply, I just want to clarify one thing.

I'm looking at the STAR gene counts for these two genes (i realise that this is not the same as the solo output). Gene 1 is the +strand gene and gene2 is the -strand gene:

gene1   2786    4516    103
gene2   1514    395 2952

The summed total counts are not equal, which I understand is due to the overlapping annotation and the implementation of counting for bulk data. Are you saying that if this were a single cell, the default --soloStrand would call gene1 at 4516 and gene2 at 2952? In this scenario that's what I want to happen.

alexdobin commented 1 year ago

STARsolo counts will be similar to either column 3 (Forward) or column 4 (Reverse).