ctlab / fgsea

Fast Gene Set Enrichment Analysis
Other
379 stars 67 forks source link

Add support for one-tailed GSEA tests #27

Closed khughitt closed 4 years ago

khughitt commented 6 years ago

This may be outside of the desired scope of your software, but it would be quite useful if fgsea could be extended to support one-tailed GSEA tests.

This way, fgsea could be used to analyze other non-expression single-tailed gene statistics, and could also be used to look abs(fold change) rankings.

Here is an example of another package which implements such an approach:

assaron commented 5 years ago

We're now focusing on updating the normal version (see http://rain.ifmo.ru/~alserg/fgseaMultilevel.html). When we are done with it we'll come back to different versions of GSEA tests, I hope most of them could be implemented in the same efficient manner.

ArtemSokolov commented 5 years ago

+1 I think this would be super helpful. We have been heavy users of fgsea in our lab (fantastic package!), and one-sided tests is one feature that would be extremely useful to our analyses.

assaron commented 4 years ago

@khughitt @ArtemSokolov at the moment we're coming back to this feature request, so can you clarify what particular tests are you interested in and what are the use scenarios? We are aware of ssGSEA, which is quite popular, and absFilterGSEA, which is not. Is there anything else, we're missing?

khughitt commented 4 years ago

Hi @assaron Thanks for following up! The main scenario I testing for enrichment expected at only one end of the distribution. For example, one could imagine assigning a positive score to each gene based on their association with a phenotype of interest. It would be interesting to then see if the score distribution is enriched for some functional annotations.

ArtemSokolov commented 4 years ago

Just realized that I never responded to this.

An example use case might be inferring the activity of a transcription factor where we don't care about the directionality of its effect (activating or inhibiting) on downstream targets. The standard trick in this case is to apply GSEA to the abs(expression) vector. However, the p-value calculation needs to be adjusted to be one-tailed, because abs(x) can never be negative.

I think the easiest solution might be to provide an additional argument to fgsea() to allow the user to specify whether the test is two-tailed. Take a look at the built-in ?t.test for an example (specifically, how the "tailed-ness" of the test can be provided via the alternative argument).

assaron commented 4 years ago

We added experimental support for one-tailed tests in the recent version (available from github). Please check it. As expected, there are no big differences for the example dataset.

To use one-tailed test specify scoreType argument with either "pos" or "neg" value. "Pos" will use only positive mode of enrichment score (in the paper we call it ES+), and the P-value will correspond to probability of greater ES+ values. "Neg" is similar but for ES- and P-value correspond to lower (more negative) ES- values.

kvittingseerup commented 2 years ago

Are there any assumptions about the distribution of stats when using scoreType='pos' ?

I'm asking because I'm using fgsea to analyze two datasets using -log10(p-value) as input. For one of them, the resulting fgsea p-value histogram looks fine but for the other, the mode of the p-value distribution seems to be [0.05-0.10] range instead of ~0 or ~1 as I would expect.

assaron commented 2 years ago

@kvittingseerup No, there are no assumptions. If you can provide an example, we could try to find out what's happening

kvittingseerup commented 2 years ago

Thanks. I'll make a new issue with some example data