Closed khughitt closed 4 years ago
We're now focusing on updating the normal version (see http://rain.ifmo.ru/~alserg/fgseaMultilevel.html). When we are done with it we'll come back to different versions of GSEA tests, I hope most of them could be implemented in the same efficient manner.
+1 I think this would be super helpful. We have been heavy users of fgsea
in our lab (fantastic package!), and one-sided tests is one feature that would be extremely useful to our analyses.
@khughitt @ArtemSokolov at the moment we're coming back to this feature request, so can you clarify what particular tests are you interested in and what are the use scenarios? We are aware of ssGSEA, which is quite popular, and absFilterGSEA, which is not. Is there anything else, we're missing?
Hi @assaron Thanks for following up! The main scenario I testing for enrichment expected at only one end of the distribution. For example, one could imagine assigning a positive score to each gene based on their association with a phenotype of interest. It would be interesting to then see if the score distribution is enriched for some functional annotations.
Just realized that I never responded to this.
An example use case might be inferring the activity of a transcription factor where we don't care about the directionality of its effect (activating or inhibiting) on downstream targets. The standard trick in this case is to apply GSEA to the abs(expression)
vector. However, the p-value calculation needs to be adjusted to be one-tailed, because abs(x)
can never be negative.
I think the easiest solution might be to provide an additional argument to fgsea()
to allow the user to specify whether the test is two-tailed. Take a look at the built-in ?t.test
for an example (specifically, how the "tailed-ness" of the test can be provided via the alternative
argument).
We added experimental support for one-tailed tests in the recent version (available from github). Please check it. As expected, there are no big differences for the example dataset.
To use one-tailed test specify scoreType argument with either "pos" or "neg" value. "Pos" will use only positive mode of enrichment score (in the paper we call it ES+), and the P-value will correspond to probability of greater ES+ values. "Neg" is similar but for ES- and P-value correspond to lower (more negative) ES- values.
Are there any assumptions about the distribution of stats when using scoreType='pos' ?
I'm asking because I'm using fgsea to analyze two datasets using -log10(p-value) as input. For one of them, the resulting fgsea p-value histogram looks fine but for the other, the mode of the p-value distribution seems to be [0.05-0.10] range instead of ~0 or ~1 as I would expect.
@kvittingseerup No, there are no assumptions. If you can provide an example, we could try to find out what's happening
Thanks. I'll make a new issue with some example data
This may be outside of the desired scope of your software, but it would be quite useful if fgsea could be extended to support one-tailed GSEA tests.
This way, fgsea could be used to analyze other non-expression single-tailed gene statistics, and could also be used to look
abs(fold change)
rankings.Here is an example of another package which implements such an approach: