Closed romanhaa closed 3 years ago
Just wanted to add that I just tried the same with R 3.5.1
and it also gets stuck.
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux buster/sid
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] org.Hs.eg.db_3.6.0 AnnotationDbi_1.42.1 IRanges_2.14.10
[4] S4Vectors_0.18.3 Biobase_2.40.0 BiocGenerics_0.26.0
[7] clusterProfiler_3.8.1 BiocInstaller_1.30.0
loaded via a namespace (and not attached):
[1] ggrepel_0.8.0 Rcpp_0.12.17 lattice_0.20-35
[4] tidyr_0.8.1 GO.db_3.6.0 assertthat_0.2.0
[7] digest_0.6.15 ggforce_0.1.3 R6_2.2.2
[10] plyr_1.8.4 ggridges_0.5.0 RSQLite_2.1.1
[13] ggplot2_3.0.0 pillar_1.2.3 rlang_0.2.1
[16] lazyeval_0.2.1 data.table_1.11.4 blob_1.1.1
[19] Matrix_1.2-14 qvalue_2.12.0 splines_3.5.1
[22] BiocParallel_1.14.2 stringr_1.3.1 igraph_1.2.1
[25] bit_1.1-14 munsell_0.5.0 fgsea_1.6.0
[28] compiler_3.5.1 pkgconfig_2.0.1 tidyselect_0.2.4
[31] tibble_1.4.2 gridExtra_2.3 enrichplot_1.0.2
[34] viridisLite_0.3.0 dplyr_0.7.6 MASS_7.3-50
[37] grid_3.5.1 gtable_0.2.0 DBI_1.0.0
[40] magrittr_1.5 units_0.6-0 scales_0.5.0
[43] stringi_1.2.3 GOSemSim_2.6.0 reshape2_1.4.3
[46] viridis_0.5.1 bindrcpp_0.2.2 DO.db_2.9
[49] rvcheck_0.1.0 cowplot_0.9.2 fastmatch_1.1-0
[52] tools_3.5.1 bit64_0.9-7 glue_1.2.0
[55] tweenr_0.1.5 purrr_0.2.5 ggraph_1.0.2
[58] colorspace_1.3-2 UpSetR_1.3.3 DOSE_3.6.1
[61] memoise_1.1.0 bindr_0.1.1
see the support many ID types
session on http://guangchuangyu.github.io/2016/01/go-analysis-using-clusterprofiler/.
In order to support different ID types, now groupGO
, enrichGO
and gseGO
use select
interface to get the GO annotation from OrgDb
, and such implementation (in AnnotationDbi
package) is quite slow.
However, this should not take more than 30min in ordinary PC.
Yes I understand that and have experienced myself that select()
takes some time. However, I'm not sure this is the reason here since it works fine with R 3.4.4
. I also noticed that the R sessions that get stuck are also consuming a lot of CPU resources.
@romanhaa, I experienced the same issue. It now takes ages to do any analysis with R 3.5.1
@Krzysztof-Piotr I ended up using the enrichR API for my purposes: https://cran.r-project.org/web/packages/enrichR/index.html
Hi,
Thanks so much for developing ClusterProfiler. It is a very useful package. However, something happened since I last used it that makes the gseGO function now nearly unusable. I just let gsego run the entire night and and it didn't finish similiar to the issue described above.
The command I used:
GSEA_out <- gseGO(geneList = GSEA_in,
OrgDb = org.Hs.eg.db,
ont = "BP",
nPerm = 1000,
pvalueCutoff = 1, #later select significant genes
verbose = T,
keyType = "ENSEMBL")
The output:
preparing geneSet collections... GSEA analysis... There are ties in the preranked stats (4.57% of the list). The order of those tied genes will be arbitrary, which may produce unexpected results.
Potentially related: The example from the vignette now fails with the error message
Error in data.frame(ID = as.character(tmp_res$pathway), Description = Description, : row names contain missing values
Traceback():
5: stop("row names contain missing values") 4: data.frame(ID = as.character(tmp_res$pathway), Description = Description, setSize = tmp_res$size, enrichmentScore = tmp_res$ES, NES = tmp_res$NES, pvalue = tmp_res$pval, p.adjust = p.adj, qvalues = qvalues, stringsAsFactors = FALSE) 3: .GSEA(geneList = geneList, exponent = exponent, nPerm = nPerm, minGSSize = minGSSize, maxGSSize = maxGSSize, pvalueCutoff = pvalueCutoff, pAdjustMethod = pAdjustMethod, verbose = verbose, seed = seed, USER_DATA = USER_DATA) 2: GSEA_internal(geneList = geneList, exponent = exponent, nPerm = nPerm, minGSSize = minGSSize, maxGSSize = maxGSSize, pvalueCutoff = pvalueCutoff, pAdjustMethod = pAdjustMethod, verbose = verbose, USER_DATA = GO_DATA, seed = seed, by = by) 1: gseGO(geneList = geneList, OrgDb = org.Hs.eg.db, ont = "CC", nPerm = 1000, minGSSize = 100, maxGSSize = 500, pvalueCutoff = 0.05, verbose = FALSE)
However, when I set ont to "BP" it runs through ok.
For my own dataset the gene list does not seem to be the problem since gseGO completes the computation with nPerm = 10 in 61sec
Session info:
R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS 10.14
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base
other attached packages: [1] org.Hs.eg.db_3.7.0 AnnotationDbi_1.44.0 IRanges_2.16.0
[4] S4Vectors_0.20.1 Biobase_2.40.0 BiocGenerics_0.28.0
[7] cividis_0.2.0 gridExtra_2.3 forcats_0.3.0
[10] stringr_1.3.1 dplyr_0.7.8 purrr_0.3.0
[13] readr_1.3.1 tidyr_0.8.2 tibble_2.0.1
[16] ggplot2_3.1.0.9000 tidyverse_1.2.1 clusterProfiler_3.10.1 [19] reticulate_1.10.0.9001loaded via a namespace (and not attached): [1] nlme_3.1-137 enrichplot_1.0.2 lubridate_1.7.4
[4] bit64_0.9-7 httr_1.4.0 UpSetR_1.3.3
[7] tools_3.5.1 backports_1.1.3 R6_2.3.0
[10] DBI_1.0.0 lazyeval_0.2.1 colorspace_1.4-0
[13] withr_2.1.2 tidyselect_0.2.5 bit_1.1-14
[16] compiler_3.5.1 cli_1.0.1 rvest_0.3.2
[19] xml2_1.2.0 scales_1.0.0 ggridges_0.5.1
[22] digest_0.6.18 DOSE_3.8.2 pkgconfig_2.0.2
[25] rlang_0.3.1 readxl_1.2.0 rstudioapi_0.9.0
[28] RSQLite_2.1.1 bindr_0.1.1 generics_0.0.2
[31] jsonlite_1.6 BiocParallel_1.14.2 GOSemSim_2.6.2
[34] magrittr_1.5 GO.db_3.6.0 Matrix_1.2-14
[37] Rcpp_1.0.0 munsell_0.5.0 viridis_0.5.1
[40] stringi_1.2.4 yaml_2.2.0 ggraph_1.0.2
[43] MASS_7.3-50 plyr_1.8.4 qvalue_2.12.0
[46] grid_3.5.1 blob_1.1.1 ggrepel_0.8.0.9000 [49] DO.db_2.9 crayon_1.3.4 lattice_0.20-35
[52] haven_2.0.0 cowplot_0.9.3 splines_3.5.1
[55] hms_0.4.2 knitr_1.21 pillar_1.3.1
[58] fgsea_1.6.0 igraph_1.2.2 reshape2_1.4.3
[61] fastmatch_1.1-0 glue_1.3.0 data.table_1.12.0
[64] modelr_0.1.2 tweenr_0.1.5 cellranger_1.1.0
[67] gtable_0.2.0 assertthat_0.2.0 xfun_0.4
[70] ggforce_0.1.3 broom_0.5.1 viridisLite_0.3.0
[73] rvcheck_0.1.0 memoise_1.1.0 units_0.6-0
[76] bindrcpp_0.2.2
EDIT: updating to the current clusterProfiler version from github (3.11.1) didn't solve the problem.
Best regards Max
I did some further digging. There seem to be two different problems here:
a) when working with ENSEMBL identifiers fgsea runs forever. It runs very quickly with ENTREZID.
b)
Description <- TERM2NAME(tmp_res$pathway, USER_DATA)
within DOSE:::GSEA_fgsea() returns a named vector, which apparently includes NAs in the vector names leading to the error message
Error in data.frame(ID = as.character(tmp_res$pathway), Description = Description, : row names contain missing values when res <- data.frame is assembled.
Description <- unname(Description)
solves this.
Best Max
I am facing a same issue. With encrichGO()
its taking forever, even for 10 Ensembl IDs.
I'm getting the same problem with GSEA
Hi all, I am facing the same issue and saw this post. I was wondering if you could explain me how to use this solution:
Description <- unname(Description)
to fix the problem.
Many thanks Best Luca
we optimized the code and now it take less memory and run faster in clusterProfiler 4.0
I'm experiencing a problem running groupGO (doesn't finish) in
R 3.5.0
. The command runs fine inR 3.4.4
, however theclusterProfiler
version is different between the two. All packages were freshly installed and therefore should beup-to-date (seesessionInfo()
below).Commands
What happens
In R 3.5.0 I had to abort this command because it didn't finish even after 30 minutes. In R 3.4.4 it is correctly executed within 1 minute.
System
In both containers, I ran:
sessionInfo R 3.4.4
sessionInfo R 3.5.0
Extract of code (R 3.4.4)
Extract of code (R 3.5.0)