ctlab / fgsea

Fast Gene Set Enrichment Analysis
Other
379 stars 67 forks source link

p values #106

Closed davidlyon3 closed 3 years ago

davidlyon3 commented 3 years ago

Firstly Thanks for a great R package

Problem: If I run fgsea two times:

Run 1) first using a pre-ranked gene set and Run 2) reverse the ranking of the pre-ranked gene set from Run 1

Question: should the p-values be the same in Run 1) and Run 2) above or slightly different and why?

I am getting slightly different p-values and was wondering if that's normal .

Thanks in advance

vdsukhov commented 3 years ago

Hi @davidlyon3 , could you please provide some code example?

davidlyon3 commented 3 years ago

data(examplePathways) data(exampleRanks)

RUN1: fgseaRes <- fgsea(pathways = examplePathways, stats = exampleRanks, minSize=15, maxSize=500, nperm=100000)

RUN2:

gseaRes2 <- fgsea(pathways = examplePathways, stats = exampleRanks * -1, minSize=15, maxSize=500, nperm=100000)

`` gseaRes pathway pval padj ES NES nMoreExtreme size 1: 1221633_Meiotic_Synapsis 0.54327717 0.72190572 0.2885754 0.9396527 31891 27 2: 1445146_Translocation_of_Glut4_to_the_Plasma_Membrane 0.69086362 0.83623743 0.2387284 0.8435079 41989 39 3: 442533_Transcriptional_Regulation_of_Adipocyte_Differentiation_in_3T3-L1_Pre-adipocytes 0.10840075 0.26249107 -0.3640706 -1.3462434 4394 31 4: 508751_Circadian_Clock 0.80009262 0.88725363 0.2516324 0.7310837 44917 17 5: 5334727_Mus_musculus_biological_processes 0.36904062 0.56649064 0.2469065 1.0528939 25110 106

gseaRes2

                               pathway       pval      padj         ES        NES nMoreExtreme size

1: 1221633_Meiotic_Synapsis 0.54155629 0.7182867 -0.2885754 -0.9400810 31686 27 2: 1445146_Translocation_of_Glut4_to_the_Plasma_Membrane 0.68929551 0.8361444 -0.2387284 -0.8449794 41700 39 3: 442533_Transcriptional_Regulation_of_Adipocyte_Differentiation_in_3T3-L1_Pre-adipocytes 0.11079281 0.2677516 0.3640706 1.3416978 4524 31 4: 508751_Circadian_Clock 0.80010695 0.8884538 -0.2516324 -0.7302689 44885 17 5: 5334727_Mus_musculus_biological_processes 0.36909586 0.5673975 -0.2469065 -1.0538866 24958 106

``

As expected ES is identical for RUN1/RUN2 and signs are opposite which is good. But how can I get the padj to be also identical or if this is not possible can you let me know why ? Can I run this so that the padj are identical, they should be in the ideal world.

Thanks again

vdsukhov commented 3 years ago

@davidlyon3 If I understood correctly, then reverse_ranking is the same as multiplying all gene-level statistics by -1. In this case, yes, the P-values will be slightly different, and this is due to the random number generator. In the case of multiplication by -1, the gene-level statistics are stored in reverse order inside of fgsea. This leads to minor differences in the estimation of P-values.