alexdobin / STAR

RNA-seq aligner
MIT License
1.77k stars 495 forks source link

Calculation of sequencing saturation #2073

Open nbartonicek opened 4 months ago

nbartonicek commented 4 months ago

Good day, I am trying to figure out how is sequencing saturation calculated from the basic stats in Summary.csv, since the numbers do not add up.

In the following case: Number of Reads,72704122 Reads With Valid Barcodes,0.983831 Sequencing Saturation,0.443522 Q30 Bases in CB+UMI,0.958173 Q30 Bases in RNA read,0.938477 Reads Mapped to Genome: Unique+Multiple,0.951636 Reads Mapped to Genome: Unique,0.754136 Reads Mapped to Gene: Unique+Multiple Gene,NoMulti Reads Mapped to Gene: Unique Gene,0.715291 Estimated Number of Cells,76 Unique Reads in Cells Mapped to Gene,51754771 Fraction of Unique Reads in Cells,0.995196 Mean Reads per Cell,680983 Median Reads per Cell,573753 UMIs in Cells,28775002 Mean UMI per Cell,378618 Median UMI per Cell,319694 Mean Gene per Cell,10198 Median Gene per Cell,10303 Total Gene Detected,19735 ... Sequencing saturation should be: 1-(unique reads in cells mapped to gene)/(reads with valid barcodes). But that gives only 1-(51754771/(72704122*0.983831)) which is 0.276, not 0.443522.

Any help greatly appreciated, Nenad

alexdobin commented 4 months ago

Hi Nenad,

the formula for calculating saturation is explained here: https://github.com/alexdobin/STAR/issues/2048