Problematic p-value histogram from fgsea (scoreType='pos')

kvittingseerup commented 2 years ago

Hi Folks

First of all thanks for an amazing tool! I've been using it for most of my recent research.

Recently I've run into a problem. I've been using fgseaMultilevel(..., scoreType = 'pos', pathways = gs) to analyze two paired datasets, one from DESeq2 and one from DEXSeq. The p-value histograms of the DESeq2/DEXSeq analysis looks normal but when I run fgsea the fgsea p-value histogram looks problematic for the DEXSeq analysis:

I would expect the mode of the distribution to be either ~0 or ~1 (like it is for the DESeq2 results), not ~0.05 like it is for the DEXseq results.

The example data visualized here can be downloaded from here and can be loaded into R with the load() command adding these 3 objects to your environment:

dgeStats # -log10(p-values) from DESeq2
dguStats # -log10(p-values) from DEXSeq
gs # the gene sets I'm analyzing

I have this problem both with fgsea v1.16.0 and v1.19.4 which are the two versions I've tested.

Looking forward to hearing from you.

Cheers Kristoffer

kvittingseerup commented 2 years ago

Is there any news on this problem? :-)

assaron commented 2 years ago

@kvittingseerup Sorry for the late reply. I've played a bit with your data, it seems that this is an artifact. Overall, the results from DEXseq are much less significant, and as our p-values are empirical, sometimes there is a perceived peak on the histogram. For different seeds this peak will disappear. Nevertheless, the p-values are well fir to a beta-uniform mixture (e.g. with BioNet::fitBumModel), which I consider as a good sign.

Additionally, I would note, that your DEXSeq P-values seem to be overconservative (with a pronounced peak at ~1), probably this is the reason, why enrichment results show much fewer significant hits.

kvittingseerup commented 2 years ago

Thanks for looking into this and figuring it out :-)

ctlab / fgsea

Problematic p-value histogram from fgsea (scoreType='pos') #111