alexdobin / STAR

RNA-seq aligner
MIT License
1.78k stars 498 forks source link

How is Sequencing Saturation calculated? #1682

Open pandh0607 opened 1 year ago

pandh0607 commented 1 year ago

Hi alexdobin! How is Sequencing Saturation calculated? From the bam file of STAR, if the count of CB+UB+GX is unique, consider it as n deduped Reads, count all CB+UB+GX as N_ Reads, finally calculated (1-n_deduped_reads/N_reads), but it is inconsistent with the Sequencing Saturation obtained by star

alexdobin commented 1 year ago

Hi @pandh0607

n_deduped_reads is the number of distinct UMIs.

pandh0607 commented 1 year ago

Hi alex, Thank you very much for your reply! Distinct UMIs means that umi+gene is unique? Or is barcode+umi unique?

alexdobin commented 1 year ago

Distinct UMI meand CB+gene+umi combination is unique.

pandh0607 commented 1 year ago

Hi alex, I use the following formula to calculate Sequencing Saturation Sequencing Saturation = 1-n_deduped_reads/N_reads

n_deduped_reads meand CB+gene+umi combination is unique. What does N_reads means? Is it the number of all combinations of CB+gene+umi?Or the sum of all reads(CB+gene+umi)?

alexdobin commented 1 year ago

N_reads is the total number of reads that have valid CB/UMI/gene, i.e. before collapsing UMIs.