Open guidohooiveld opened 5 months ago
@guidohooiveld thanks for the report. I can reproduce the problem. I'll check later what's going on.
To keep you updated: this is turned out to be an issue of the algorithm we were generally aware of, although not in this setting. Anyway we recently developed an approach how to properly fix it. Hopefully we'll be able to integrate the proper fix into fgsea in not so distant future, but also it's not trivial, so I can't make any ETA. As a workaround for now one could add random noise to the input scores, and everything should start working fine:
res <- fgseaMultilevel(
pathways = term2gene.go,
stats = input.genes+rnorm(length(input.genes), sd=0.001),
minSize = 10,
maxSize = 500,
eps = 0,
scoreType = c("std") )
Hello, I believe this post might help solve some problems (https://www.biostars.org/p/327699/). Running calculations in parallel on a Windows system can be challenging, but using the code register(SerialParam()) can force the machine to run in Serial mode. On a Linux system, I tested the code without any issues and without needing any adjustments. I'm not familiar with the underlying operating mechanism, though.
Hello, I believe this post might help solve some problems (https://www.biostars.org/p/327699/). Running calculations in parallel on a Windows system can be challenging, but using the code register(SerialParam()) can force the machine to run in Serial mode. On a Linux system, I tested the code without any issues and without needing any adjustments. I'm not familiar with the underlying operating mechanism, though.
sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.utf8 LC_CTYPE=Chinese (Simplified)_China.utf8
[3] LC_MONETARY=Chinese (Simplified)_China.utf8 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.utf8
time zone: Asia/Shanghai
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocParallel_1.34.2 org.Mm.eg.db_3.17.0 AnnotationDbi_1.62.2 IRanges_2.34.1 S4Vectors_0.38.2
[6] Biobase_2.60.0 BiocGenerics_0.46.0 enrichplot_1.20.3 clusterProfiler_4.8.3 ggpubr_0.6.0
[11] ggplot2_3.5.1 purrr_1.0.2 tibble_3.2.1 dplyr_1.1.4 tidyr_1.3.1
loaded via a namespace (and not attached):
[1] DBI_1.2.3 bitops_1.0-8 gson_0.1.0 shadowtext_0.1.4 gridExtra_2.3
[6] rlang_1.1.4 magrittr_2.0.3 DOSE_3.26.2 compiler_4.3.1 RSQLite_2.3.7
[11] png_0.1-8 vctrs_0.6.5 reshape2_1.4.4 stringr_1.5.1 pkgconfig_2.0.3
[16] crayon_1.5.3 fastmap_1.2.0 backports_1.5.0 XVector_0.40.0 ggraph_2.2.1
[21] utf8_1.2.4 HDO.db_0.99.1 bit_4.0.5 zlibbioc_1.46.0 cachem_1.1.0
[26] aplot_0.2.3 jsonlite_1.8.8 GenomeInfoDb_1.36.4 blob_1.2.4 tweenr_2.0.3
[31] broom_1.0.6 parallel_4.3.1 R6_2.5.1 stringi_1.8.4 RColorBrewer_1.1-3
[36] car_3.1-2 GOSemSim_2.26.1 Rcpp_1.0.13 snow_0.4-4 downloader_0.4
[41] pacman_0.5.1 Matrix_1.5-4.1 splines_4.3.1 igraph_2.0.3 tidyselect_1.2.1
[46] qvalue_2.32.0 abind_1.4-5 viridis_0.6.5 codetools_0.2-20 lattice_0.22-6
[51] plyr_1.8.9 treeio_1.24.3 withr_3.0.1 KEGGREST_1.40.1 gridGraphics_0.5-1
[56] scatterpie_0.2.4 polyclip_1.10-7 Biostrings_2.68.1 ggtree_3.8.2 pillar_1.9.0
[61] carData_3.0-5 ggfun_0.1.6 generics_0.1.3 RCurl_1.98-1.16 tidytree_0.4.6
[66] munsell_0.5.1 scales_1.3.0 glue_1.7.0 lazyeval_0.2.2 tools_4.3.1
[71] data.table_1.16.0 fgsea_1.26.0 ggsignif_0.6.4 fs_1.6.4 graphlayouts_1.1.1
[76] fastmatch_1.1-4 tidygraph_1.3.1 cowplot_1.1.3 grid_4.3.1 ape_5.8
[81] colorspace_2.1-1 nlme_3.1-166 patchwork_1.2.0 GenomeInfoDbData_1.2.10 ggforce_0.4.2
[86] cli_3.6.3 fansi_1.0.6 viridisLite_0.4.2 gtable_0.3.5 rstatix_0.7.2
[91] yulab.utils_0.1.7 digest_0.6.37 ggrepel_0.9.5 ggplotify_0.1.2 farver_2.1.2
[96] memoise_2.0.1 lifecycle_1.0.4 httr_1.4.7 GO.db_3.17.0 bit64_4.0.5
[101] MASS_7.3-60
The solution with SerialParam()
worked nicely. thanks
@jinxmeng Thanks for your comment, but I suspect it to be a different bug. It'd be great if you could provide more background, like was there anything special about your input and how reproducible it in your settings. We don't test it on Windows machines too much, but my understanding was that it should run OK, even without SerialParam()
setting.
@yeroslaviz Similarly, I'm not sure what's going on in your case. I saw your post on Biostars, apparently you use Mac, not Windows, and also had nproc=1
parameter. The latter, in my mind, should be effectively the same as setting SerialParam()
mode, so it's weird that it helped. Could you also provide more background on your case?
@assaron Thank you for your response. I don't fully understand the operating mechanism here. I will provide the relevant files and code for you to test. Thank you! My computer system is Microsoft Windows 11, and my CPU is 'Intel64 Family 6 Model 183 Stepping 1 GenuineIntel ~2100 MHz' (Intel(R) Core(TM) i7-14700HX, 2100 MHz, 20 cores).
> library(dplyr)
> library(clusterProfiler)
> gene_list <- readRDS("gene_list.rds")
> gseKEGG <- gseKEGG(gene_list, organism = "mmu", minGSSize = 10, maxGSSize = 1000, pvalueCutoff = 1)
preparing geneSet collections...
GSEA analysis...
#The process lasted for 1 minute without any response.
> register(SerialParam())
> system.time(gseKEGG <- gseKEGG(gene_list, organism = "mmu", minGSSize = 10, maxGSSize = 1000, pvalueCutoff = 1))
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
用户 系统 流逝
0.64 0.02 1.86
Warning messages:
1: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, :
There are ties in the preranked stats (0.02% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.
2: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, :
There are duplicate gene names, fgsea may produce unexpected results.
3: In fgseaMultilevel(pathways = pathways, stats = stats, minSize = minSize, :
For some pathways, in reality P-values are less than 1e-10. You can set the `eps` argument to zero for better estimation.
#The process lasted for 1.86 seconds and produced the output.
> sessionInfo()
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.utf8 LC_CTYPE=Chinese (Simplified)_China.utf8 LC_MONETARY=Chinese (Simplified)_China.utf8
[4] LC_NUMERIC=C LC_TIME=Chinese (Simplified)_China.utf8
time zone: Asia/Shanghai
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocParallel_1.36.0 org.Mm.eg.db_3.18.0 AnnotationDbi_1.64.1 IRanges_2.36.0 S4Vectors_0.40.2 Biobase_2.62.0
[7] BiocGenerics_0.48.1 enrichplot_1.22.0 clusterProfiler_4.12.2 cowplot_1.1.3 data.table_1.15.4 harmony_1.2.0
[13] Rcpp_1.0.13 Seurat_5.1.0 SeuratObject_5.0.2 sp_2.1-4 stringr_1.5.1 metafor_4.6-0
[19] numDeriv_2016.8-1.1 metadat_1.2-0 Matrix_1.6-5 ggpubr_0.6.0 ggplot2_3.5.1 purrr_1.0.2
[25] tibble_3.2.1 dplyr_1.1.4 tidyr_1.3.1
loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.22 splines_4.3.3 later_1.3.2 ggplotify_0.1.2 bitops_1.0-7
[6] polyclip_1.10-7 fastDummies_1.7.4 httr2_1.0.2 lifecycle_1.0.4 rstatix_0.7.2
[11] globals_0.16.3 lattice_0.22-5 MASS_7.3-60.0.1 backports_1.5.0 magrittr_2.0.3
[16] limma_3.60.3 plotly_4.10.4 httpuv_1.6.15 sctransform_0.4.1 spam_2.10-0
[21] spatstat.sparse_3.1-0 reticulate_1.38.0 pbapply_1.7-2 DBI_1.2.3 RColorBrewer_1.1-3
[26] abind_1.4-5 zlibbioc_1.48.0 Rtsne_0.17 presto_1.0.0 ggraph_2.2.1
[31] RCurl_1.98-1.14 yulab.utils_0.1.6 tweenr_2.0.3 rappdirs_0.3.3 GenomeInfoDbData_1.2.11
[36] ggrepel_0.9.5 irlba_2.3.5.1 listenv_0.9.1 spatstat.utils_3.0-5 tidytree_0.4.6
[41] goftest_1.2-3 RSpectra_0.16-2 spatstat.random_3.3-1 fitdistrplus_1.2-1 parallelly_1.38.0
[46] leiden_0.4.3.1 codetools_0.2-19 ggforce_0.4.2 DOSE_3.28.2 tidyselect_1.2.1
[51] aplot_0.2.3 farver_2.1.2 viridis_0.6.5 matrixStats_1.3.0 spatstat.explore_3.3-1
[56] mathjaxr_1.6-0 jsonlite_1.8.8 tidygraph_1.3.1 progressr_0.14.0 ggridges_0.5.6
[61] survival_3.5-8 systemfonts_1.1.0 tools_4.3.3 treeio_1.26.0 ragg_1.3.2
[66] snow_0.4-4 ica_1.0-3 glue_1.7.0 gridExtra_2.3 qvalue_2.34.0
[71] GenomeInfoDb_1.38.8 withr_3.0.1 fastmap_1.2.0 fansi_1.0.6 digest_0.6.36
[76] gridGraphics_0.5-1 R6_2.5.1 mime_0.12 textshaping_0.4.0 colorspace_2.1-1
[81] GO.db_3.18.0 scattermore_1.2 tensor_1.5 spatstat.data_3.1-2 RSQLite_2.3.6
[86] RhpcBLASctl_0.23-42 utf8_1.2.4 generics_0.1.3 graphlayouts_1.1.1 httr_1.4.7
[91] htmlwidgets_1.6.4 scatterpie_0.2.3 uwot_0.2.2 pkgconfig_2.0.3 gtable_0.3.5
[96] blob_1.2.4 lmtest_0.9-40 XVector_0.42.0 shadowtext_0.1.4 htmltools_0.5.8.1
[101] carData_3.0-5 fgsea_1.30.0 dotCall64_1.1-1 scales_1.3.0 png_0.1-8
[106] spatstat.univar_3.0-0 ggfun_0.1.5 rstudioapi_0.16.0 reshape2_1.4.4 nlme_3.1-164
[111] zoo_1.8-12 cachem_1.1.0 KernSmooth_2.23-22 HDO.db_0.99.1 parallel_4.3.3
[116] miniUI_0.1.1.1 pillar_1.9.0 grid_4.3.3 vctrs_0.6.5 RANN_2.6.1
[121] promises_1.3.0 car_3.1-2 xtable_1.8-4 cluster_2.1.6 cli_3.6.2
[126] compiler_4.3.3 rlang_1.1.3 crayon_1.5.3 future.apply_1.11.2 ggsignif_0.6.4
[131] labeling_0.4.3 fs_1.6.4 plyr_1.8.9 stringi_1.8.4 viridisLite_0.4.2
[136] deldir_2.0-4 munsell_0.5.1 Biostrings_2.70.1 lazyeval_0.2.2 spatstat.geom_3.3-2
[141] GOSemSim_2.28.1 pacman_0.5.1 RcppHNSW_0.6.0 patchwork_1.2.0 bit64_4.0.5
[146] future_1.34.0 KEGGREST_1.42.0 statmod_1.5.0 shiny_1.9.1 ROCR_1.0-11
[151] igraph_2.0.3 broom_1.0.6 memoise_2.0.1 ggtree_3.10.1 fastmatch_1.1-4
[156] bit_4.0.5 gson_0.1.0 ape_5.8
@assaron - you're correct. the serialParam() didn't solve the problem, when I tried it again, but your suggestion to add random noise did. So, for now I'm happy about it.
That's also the reason, why I deleted the comment on Biostar.
thanks for the solution. It would be great if you can also fix the problem in the future.
Hi Alex,
A (reproducible) issue ("GSEA hangs") was posted on the
clusterProfiler
GitHub. See: https://github.com/YuLab-SMU/clusterProfiler/issues/659#issuecomment-2027820878, and posts below that one.Since
clusterProfiler
uses under the hoodfgsea
for gene set enrichment analysis, I checked whether the reported issue originates from the way input/output data is being processed byclusterProfiler
, or fromfgsea
. It turns that I could reproduce the issue when directly usingfgsea
, hence this post.Please note that the OP reported this issue when using
R-4.2.2
, but I could reproduce it also with the current versions of R (R-4.3.0
resp.R-4.3.3
) andfgsea
on both my Windows resp. Linux machines.Also note that the issue occurs when
minSize
is set to 10; whenminSize=11
is uedfgsea
runs as expected...For your convenience I have attached the 2 input files to this post as
RData
file (which I compressed into an ZIP archive in order to be able to upload it). See below how these objects were generated, also in case you would like to generate them yourselves.I would appreciate if you could have a look at this to see whether this can be fixed. G
sessionInfo()
Windows machine:sessionInfo()
Linux machine:fgsea.input.zip