Closed shangguandong1996 closed 4 years ago
That means that the statistic values are not symmetric and the probability of having positive (or negative) enrichment score is far from 0.5. In such cases it can be very hard to estimate P-values. It can be a sign of gene expression data not being properly normalized before differential expression.
But why some pathways may succeed to estimates P-values while some may not, if this because of the not being properly normalized
That probability depends on pathway size. For some size it's closer to 0.5 and doesn't make things too bad.
please forgive me if I misunderstand something. so fgsea make a assumption that the up and down gene number are approximately same ? But it may failed in some samples
GSEA P-value is calcaulated as P(ES >= x)/P(ES > 0), where x is the enrichment score of the tested pathway (assuming it to be positive) and ES is an enrichment score of a random gene set of the same size. On some datasets and pathways the denominator probability P(ES > 0) can be very low and hard to estimate properly. If the ranking is balanced, then it will be around 0.5 and can be estimated easily. So, no, there is no explicit requirement of balance, but if the ranking is unbalanced, there will be warnings.
Still, GSEA makes much more sense if the ranking is more or less balanced. If it's far from that, then one-tailed test can be considered, which doesn't normalize on P(ES > ). This can be controlled by scoreType
parameter.
Thanks, Alexey, I get it.
Hi, Dear developer when using fgsea, I found the below warning
And this warning will make the correspondent pvalue in pathway become NA. So I am curious about the cause of this warning. Is it means my postive statistic values number is not same as negative?
Best wishes
Guandong Shang