DavisLaboratory / singscore

An R/Bioconductor package that implements a single-sample molecular phenotyping approach
https://davislaboratory.github.io/singscore/
40 stars 5 forks source link

Equivalent of Leading edge genes in GSEA #29

Closed hd00ljy closed 3 years ago

hd00ljy commented 3 years ago

Hello!

I wondering if it is possible to get a subset of a gene set responsible for the high scoring of the gene set.

Subsets like leading-edge genes from GSEA

With regards, Jin-Young

bhuvad commented 3 years ago

Hi @hd00ljy,

We do provide functions to extract the equivalent of leading-edge genes from GSEA, however, there are ways in which you could extract them. For any given sample, the barcode plot can be used to visualise the relative rank distribution of any gene set/signature. You could invoke the interactive plot by setting plotRankDensity(..., isInteractive = TRUE). This would allow you to hover over each bar and see what gene it represents. Using this as a starting point, you could then select a normalised rank threshold and use that to extract genes with normalised ranks higher than that using the code below.

library(singscore)

#rank expression data
eranks = rankGenes(tgfb_expr_10_se)

#plot ranks for the first sample using the barcode plot
p1 = plotRankDensity(eranks[, 1, drop = FALSE], tgfb_gs_up)
p1

#select threshold
thresh = 0.75

#extract the top ranked genes using the plot data above
granks = p1$data
top_genes = granks[granks$Ranks > thresh, 3]

Though these genes would be similar to the leading-edge genes from GSEA, their interpretation differs. GSEA assesses enrichment of statistics (such as logFC) therefore leading-edge genes are likely those that have a high logFC in the experiment. Singscore (the default use case atleast) uses raw expression data therefore leading-edge genes are generally those that have a high expression in the experiment. You could rank genes using single sample statistics (for example, expression relative to a control sample) and this would result in the prioritisation of those statistics as opposed to expression. I hope this answers your question.

Cheers, Dharmesh

hd00ljy commented 3 years ago

Thank you for your prompt and detailed answers!

That will be very helpful for my analysis.

Thank you, Jin-Young