iscastanho commented 1 year ago

I am not sure if these are already features of EWCE (although I could not find in manuals/tutorials), so I am posting them as questions.

1. Is there a way of extracting which genes are driving the enrichment for each cell type tested? At the moment I am using the “marker” genes for each population/subpopulation I have (which I identified with Seurat) but I was wondering if there was a way of extracting this directly from EWCE as I can see that it performs differential expression as part of the pipeline (using LIMMA if I noticed it correctly).

2. Can I get the mean bootstrap expression for each gene from EWCE? If so, how? In Skene and Grant, 2016 (https://doi.org/10.3389/fnins.2016.00016), in Figure 2 C and D, genes from one of the cell types (microglia) are shown, highlighting their expression against the mean bootstrap expression. I would like to have access to this type of information from my data too. How could I extract this?

Thank you.

Al-Murphy commented 1 year ago

Is there a way of extracting which genes are driving the enrichment for each cell type tested? This is essentially the specificity i.e. how specific is the expression of a gene to a cell type. This is available in the ctd made from your reference scRNA-seq dataset. For example (from the vignette dataset):
ctd <- ewceData::[ctd]
#cell type level 1 specificity for genes
ctd[[1]]$specificity
EWCE as I can see that it performs differential expression as part of the pipeline (using LIMMA if I noticed it correctly) EWCE uses limma as a filtering step to remove uninformative genes i.e. genes, the function drop_uninformative_genes. This is done to remove genes that don't vary across cell types (using limma) to help reduce noise in subsequent steps (makes a more far comparison between your gene list and the randomly sampled background genes). So in this sense, limma isn't used to identify cell type specific gene lists but as a preprocessing step to filter out genes.

Can I get the mean bootstrap expression for each gene from EWCE? If so, how? I believe figure 2 C and D are essentially showing the specificity of genes (perhaps @NathanSkene can confirm?). So you should be able to use the specificity values to get a similar plot.

Thanks

NathanSkene commented 1 year ago

EWCE can generate bootstrap plots that show the bootstrap probabilities of each gene. I thibk the function is something like generate bootstrap plots. I don't really find them that useful but regularly get asked which genes are driving the enrichments, and this is the clearest way of getting at it... but the enrichment is driven by the set of genes, not any particular gene

From: Alan Murphy @.> Sent: 06 September 2022 10:01 To: NathanSkene/EWCE @.> Cc: Skene, Nathan G @.>; Mention @.> Subject: Re: [NathanSkene/EWCE] Extracting genes driving the enrichment and their mean bootstrap expression (Issue #76)

This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

Is there a way of extracting which genes are driving the enrichment for each cell type tested? This is essentially the specificity i.e. how specific is the expression of a gene to a cell type. This is available in the ctd made from your reference scRNA-seq dataset. For example (from the vignette dataset):

ctd <- ewceData::[ctd]

cell type level 1 specificity for genes

ctd[[1]]$specificity

EWCE as I can see that it performs differential expression as part of the pipeline (using LIMMA if I noticed it correctly) EWCE uses limma as a filtering step to remove uninformative genes i.e. genes, the function drop_uninformative_geneshttps://github.com/NathanSkene/EWCE/blob/0e8dba99c15afe928edcc61c9a44092dbd992018/R/drop_uninformative_genes.r. This is done to remove genes that don't vary across cell types (using limma) to help reduce noise in subsequent steps (makes a more far comparison between your gene list and the randomly sampled background genes). So in this sense, limma isn't used to identify cell type specific gene lists but as a preprocessing step to filter out genes.

Can I get the mean bootstrap expression for each gene from EWCE? If so, how? I believe figure 2 C and D are essentially showing the specificity of genes (perhaps @NathanSkenehttps://github.com/NathanSkene can confirm?). So you should be able to use the specificity values to get a similar plot.

Thanks

— Reply to this email directly, view it on GitHubhttps://github.com/NathanSkene/EWCE/issues/76#issuecomment-1237865169, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH5ZPE2SERGE5YUTVBIFUSTV44B5JANCNFSM6AAAAAAQFKUSOQ. You are receiving this because you were mentioned.Message ID: @.***>

bschilder commented 1 year ago

Here's the function @iscastanho https://nathanskene.github.io/EWCE/reference/generate_bootstrap_plots.html

See here for some upgrades I'm making to it soon.

NathanSkene / EWCE

Extracting genes driving the enrichment and their mean bootstrap expression #76

cell type level 1 specificity for genes

77