JunyueCaoLab / EasySci

Computational pipeline to process EasySci-RNA data.
MIT License
12 stars 1 forks source link

Normalization by gene length for random primer derived expression count #2

Closed mesnger closed 2 months ago

mesnger commented 4 months ago

Hello, I am Jaeyong, and thank you for providing the great library and analysis method.

I was wondering if there is any normalization method expression counts derived for random primers. In theory, multiple reads with different UMI may originate from a single RNA if several random primers bind to a single RNA. And if this is the case, the conventional bulk RNA normalization method which utilizes gene length to normalize expression (such as FPKM or TPM) could be applied.

I have read the gene level and post processing analysis script but could not find any normalization step using with gene length. Could it be simply due the the low content of overall RNA reads deriving from random primer, that the normalization by length is unnecessary?

Any reply would be helpful. Cheers,

Jaeyong

Andras-Sziraki commented 2 months ago

Hi Jaeyong,

Thank you for your interest in our method. Yes, you are right, we don’t use gene length to normalize expression values in our single-cell pipeline due to the data sparsity. The likelihood of multiple random hexamer primers producing reads from the same RNA molecule is low, as there's often minimal signal per gene in each cell, usually just one read. Please let me know if you have any further questions.

Best wishes, Andras