constantAmateur / SoupX

R package to quantify and remove cell free mRNAs from droplet based scRNA-seq data
248 stars 34 forks source link

plotMarkerMap doubt #114

Closed MartaSanchezCarbonell closed 2 years ago

MartaSanchezCarbonell commented 2 years ago

Dear team from Soup X,

Thank you very much for this great package, it works wonderfully. I am writing because I have a doubt regarding the function plotMarkerMap. On the description of this function appears " this function calculates how many counts would be expected if that droplet were nothing but soup and compares that to the observed count". Could you please explain me how is the function calculating if the droplet was nothing but soup? Is there a randomly distribution created?

Thank you very much in advance.

Best regards,

Marta

constantAmateur commented 2 years ago

SoupX defines the relative frequency of genes in the ambient RNA background (i.e., the soup) from empty droplets in the experiment. By default this is done assuming any droplet with fewer than 100 UMIs is "empty" (i.e., contains only soup).

So using this procedure we get f_g, the fraction of reads we expect for gene 'g' in the soup, which is normalised to sum to 1 (i.e., sum_g f_g = 1).

A droplet with N UMIs that contains a cell will then have observed counts for gene g, o_g = N(rho f_g + (1-rho) c_g), where 'c_g' is the relative frequency of gene g in the cell, rho is the fraction of reads that are soup derived, and N is the number of UMIs in the droplet. Usually, rho is something small like .02-.10. But the largest it can ever be is 1, at which point o_g = N f_g.

So simply put, the function simply assumes that the relative gene frequency in an observed cell is just a scaled version of the soup.