How can I get the top genes for a celltype?

jemorlanes commented 1 year ago

Hello Alma!! :)

Hope everything is going great! I have a quick question regarding some of the outputs of stereoscope. I am interested in looking at the genes that stereoscope has decide to be the most descriptive of a certain celltype, but I am struggling in getting this.

In the output of stereoscope, I find 2 files that could be of interest: R*.tsv file stores rates for each gene for each celltype, and then the logits*.tsv file that from my understanding gives an indication of how good of an explanatory variable each gene is.

In order to get the the exact "weights" of a gene for a celltype, should I multiply the rates matrix * the logits matrix?

Thank you for your help! :))

almaan commented 1 year ago

Hi @jemorlanes ,

thanks for using stereoscope and reaching out. So what you could do is: to compute the expected value of a given gene within every cell type. If you look at the definition of the mean here you see that it's given as mean = r(1-p)/p, meanwhile logits = log(1-p)/p. Hence, you have mean_gz = r_gz * t.exp(logits_gz). Pseudocode for this would be something like (in python):

R = pd.read_csv("R*.tsv",header = 0, index_col = 0)
logits = pd.read_csv("logits*.tsv",header = 0, index_col = 0)
mean  = R * np.exp(logits)

You could then extract those genes that seem to be most highly expressed within that cell type. However, the most highly expressed genes aren't necessarily the most descriptive ones, for that kind of information you need a contrastive analysis, essentially a DGE but with only one sample per cell type, but that's also easy to execute once you have the expected values.

Best, Alma

jemorlanes commented 1 year ago

Hi Alma!

Super insightful, thank you! When you say "a DGE with only one sample per celltype", you mean?:

Get the expected mean for each gene in each celltype.
Run DGE between the celltypes using that expected mean.

almaan / stereoscope

How can I get the top genes for a celltype? #37