Closed olgabaranov closed 5 months ago
Did you notice the 2 warnings when running the function with the argument nPerm = 1000
explicitly provided?
Warning messages:
1: In .GSEA(geneList = geneList, exponent = exponent, minGSSize = minGSSize, :
We do not recommend using nPerm parameter incurrent and future releases
2: In fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize = minGSSize, :
You are trying to run fgseaSimple. It is recommended to use fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument in the fgsea function call.
>
This implicitly means that you are comparing 2 different "modes" of GSEA; your first run (without providing nPerm = 1000
) runs the recommended mode of GSEA, which is through the function fgseaMultilevel
, that calculates exact p-values.
By contrast, your second run utilizes the function fgseaSimple
, that calculates p-values with slightly limited accuracy.
Note that under the hood both functions are provided by the library fgsea
. See the fgsea
preprint for more info on the (differences between the) 2 methods/functions: https://doi.org/10.1101/060012 .
Thus, it is anticipated that the results of both runs will be slightly different. See below for some code that indeed shows this.
Yet, this does not mean that for both functions fully reproducible results can not be obtained!
To do so you will indeed need to define a seed value, but you will have to include the use of that seed value when calling gseGO
by changing the argument seed
to true (seed = TRUE
); default isseed = FALSE
.
See also this thread: https://github.com/YuLab-SMU/clusterProfiler/issues/466
> ## load libraries
> library(clusterProfiler)
> library(org.Hs.eg.db)
>
> ##load sample data
> data(geneList, package="DOSE")
>
> ## run GSEA by using recommended mode; but do not apply a significance cutoff!
> res.recom <- gseGO(geneList = geneList,
+ OrgDb = org.Hs.eg.db,
+ ont = "CC",
+ minGSSize = 100,
+ maxGSSize = 500,
+ pvalueCutoff = 1,
+ eps = 0)
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
>
> ## run GSEA by using permutations (thus with slightly reduced accuracy)
> ## note the warnings!
> res.permu <- gseGO(geneList = geneList,
+ OrgDb = org.Hs.eg.db,
+ ont = "CC",
+ minGSSize = 100,
+ maxGSSize = 500,
+ pvalueCutoff = 1,
+ nPerm = 1000)
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
Warning messages:
1: In .GSEA(geneList = geneList, exponent = exponent, minGSSize = minGSSize, :
We do not recommend using nPerm parameter incurrent and future releases
2: In fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize = minGSSize, :
You are trying to run fgseaSimple. It is recommended to use fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument in the fgsea function call.
>
>
> ## compare p-values; note that these are not identical!
> identical(res.recom, res.permu)
[1] FALSE
>
> merged.res1 <- merge( as.data.frame(res.recom), as.data.frame(res.permu), by.x="ID", by.y="ID")
> plot( merged.res1$pvalue.x, merged.res1$pvalue.y )
>
>
> ## below code to show that both functions will return fully reproducible results when a seed is used
> set.seed(1234)
>
> ## run GSEA by using permutations
> run1 <- gseGO(geneList = geneList,
+ OrgDb = org.Hs.eg.db,
+ ont = "CC",
+ minGSSize = 100,
+ maxGSSize = 500,
+ pvalueCutoff = 1,
+ nPerm = 1000,
+ verbose = FALSE,
+ seed = TRUE)
Warning messages:
1: In .GSEA(geneList = geneList, exponent = exponent, minGSSize = minGSSize, :
We do not recommend using nPerm parameter incurrent and future releases
2: In fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize = minGSSize, :
You are trying to run fgseaSimple. It is recommended to use fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument in the fgsea function call.
>
>
> run2 <- gseGO(geneList = geneList,
+ OrgDb = org.Hs.eg.db,
+ ont = "CC",
+ minGSSize = 100,
+ maxGSSize = 500,
+ pvalueCutoff = 1,
+ nPerm = 1000,
+ verbose = FALSE,
+ seed = TRUE)
Warning messages:
1: In .GSEA(geneList = geneList, exponent = exponent, minGSSize = minGSSize, :
We do not recommend using nPerm parameter incurrent and future releases
2: In fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize = minGSSize, :
You are trying to run fgseaSimple. It is recommended to use fgseaMultilevel. To run fgseaMultilevel, you need to remove the nperm argument in the fgsea function call.
>
> ## results are identical, as illustrated by perfect diagonal!
> identical(run1, run2)
[1] TRUE
>
>
> merged.res2 <- merge( as.data.frame(run1), as.data.frame(run2), by.x="ID", by.y="ID")
> plot( merged.res2$pvalue.x, merged.res2$pvalue.y )
>
>
> ## repeat for running GSEA through the recommended mode.
> ## note that seed has already been set above.
>
> run3 <- gseGO(geneList = geneList,
+ OrgDb = org.Hs.eg.db,
+ ont = "CC",
+ minGSSize = 100,
+ maxGSSize = 500,
+ pvalueCutoff = 1,
+ eps = 0,
+ verbose = FALSE,
+ seed = TRUE)
>
>
> run4 <- gseGO(geneList = geneList,
+ OrgDb = org.Hs.eg.db,
+ ont = "CC",
+ minGSSize = 100,
+ maxGSSize = 500,
+ pvalueCutoff = 1,
+ eps = 0,
+ verbose = FALSE,
+ seed = TRUE)
>
> ## results are again identical, as illustrated by another perfect diagonal!
>
>
> identical(run3, run4)
[1] TRUE
>
> merged.res3 <- merge( as.data.frame(run3), as.data.frame(run4), by.x="ID", by.y="ID")
> plot( merged.res3$pvalue.x, merged.res3$pvalue.y )
>
Thanks a lot for the detailed explanation!
On my attempt to increase the number of permutations in gseGO I am faced with an unexpected behaviour of the function:
When repeating it with
nPerm = 1000
actively set:But as far as I know, nPerm=1000 is the default so I would expect same results for both. The result is the same (0) regardless of the exact value of nPerm.
Apologies for not providing a reproducible example as I can't find a public dataset to use for it. I can try to make it work later...