YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
967 stars 246 forks source link

compareCluster's enrichGO difference in results when stating ont = "ALL" vs NULL #675

Closed jestlin15 closed 3 months ago

jestlin15 commented 3 months ago

Hi may I please ask about the reason for the difference in results when stating ont="ALL" vs not stating it when performing compareCluster?

`compareGO <- compareCluster(geneCluster = gene_lists_sorted_names, fun = "enrichGO", OrgDb= org.Hs.eg.db, pvalueCutoff=0.01, qvalueCutoff=0.05)

compareGO<- simplify(compareGO, cutoff = 0.7, by = "p.adjust", select_fun = min, measure = "Wang", semData = NULL ) dotplot1 = dotplot(compareGO, label_format = 100 ,showCategory= 40, font.size=8, title = "GO")`

vs

`compareGO_ALL <- compareCluster(geneCluster = gene_lists_sorted_names, fun = "enrichGO", ont = "ALL", OrgDb= org.Hs.eg.db, pvalueCutoff=0.01, qvalueCutoff=0.05)

compareGO_ALL<- simplify(compareGO_ALL, cutoff = 0.7, by = "p.adjust", select_fun = min, measure = "Wang", semData = NULL ) dotplot2 = dotplot(compareGO_ALL, label_format = 100 ,showCategory= 40, font.size=8, title = "GO_ALL")`

dotplot1: preview.pdf dotplot2: preview.pdf

sessionInfo()

R version 4.2.3 (2023-03-15) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.6.2

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] ggstar_1.0.4 SingleCellExperiment_1.20.1 [3] SummarizedExperiment_1.28.0 GenomicRanges_1.50.2
[5] GenomeInfoDb_1.34.9 MatrixGenerics_1.10.0
[7] matrixStats_1.2.0 org.Hs.eg.db_3.16.0
[9] AnnotationDbi_1.60.2 IRanges_2.32.0
[11] S4Vectors_0.36.2 Biobase_2.58.0
[13] BiocGenerics_0.44.0 clusterProfiler_4.6.2
[15] ggplot2_3.5.0 stringr_1.5.1
[17] SeuratWrappers_0.3.2 presto_1.0.0
[19] data.table_1.15.2 Rcpp_1.0.12
[21] readxl_1.4.3 patchwork_1.2.0
[23] Seurat_5.0.2 SeuratObject_5.0.1
[25] sp_2.1-3 dplyr_1.1.4

loaded via a namespace (and not attached): [1] utf8_1.2.4 spatstat.explore_3.2-6 reticulate_1.35.0
[4] R.utils_2.12.3 tidyselect_1.2.1 RSQLite_2.3.5
[7] htmlwidgets_1.6.4 grid_4.2.3 BiocParallel_1.32.6
[10] Rtsne_0.17 scatterpie_0.2.1 munsell_0.5.0
[13] codetools_0.2-19 ica_1.0-3 future_1.33.1
[16] miniUI_0.1.1.1 withr_3.0.0 spatstat.random_3.2-3 [19] colorspace_2.1-0 GOSemSim_2.24.0 progressr_0.14.0
[22] knitr_1.45 rstudioapi_0.15.0 ROCR_1.0-11
[25] tensor_1.5 DOSE_3.24.2 listenv_0.9.1
[28] labeling_0.4.3 GenomeInfoDbData_1.2.9 polyclip_1.10-6
[31] bit64_4.0.5 farver_2.1.1 downloader_0.4
[34] treeio_1.22.0 parallelly_1.37.1 vctrs_0.6.5
[37] generics_0.1.3 gson_0.1.0 xfun_0.42
[40] R6_2.5.1 doParallel_1.0.17 graphlayouts_1.0.1
[43] clue_0.3-65 rsvd_1.0.5 DelayedArray_0.24.0
[46] gridGraphics_0.5-1 fgsea_1.24.0 bitops_1.0-7
[49] spatstat.utils_3.0-4 cachem_1.0.8 promises_1.2.1
[52] scales_1.3.0 ggraph_2.1.0 enrichplot_1.18.4
[55] gtable_0.3.4 globals_0.16.3 goftest_1.2-3
[58] spam_2.10-0 tidygraph_1.3.0 rlang_1.1.3
[61] GlobalOptions_0.1.2 splines_4.2.3 lazyeval_0.2.2
[64] spatstat.geom_3.2-9 BiocManager_1.30.22 yaml_2.3.8
[67] reshape2_1.4.4 abind_1.4-5 httpuv_1.6.14
[70] qvalue_2.30.0 tools_4.2.3 ggplotify_0.1.2
[73] ellipsis_0.3.2 jquerylib_0.1.4 RColorBrewer_1.1-3
[76] ggridges_0.5.6 plyr_1.8.9 zlibbioc_1.44.0
[79] purrr_1.0.2 RCurl_1.98-1.13 deldir_2.0-2
[82] viridis_0.6.5 pbapply_1.7-2 GetoptLong_1.0.5
[85] cowplot_1.1.3 zoo_1.8-12 ggrepel_0.9.5
[88] cluster_2.1.6 fs_1.6.3 magrittr_2.0.3
[91] RSpectra_0.16-1 scattermore_1.2 circlize_0.4.16
[94] lmtest_0.9-40 RANN_2.6.1 fitdistrplus_1.1-11
[97] mime_0.12 evaluate_0.23 xtable_1.8-4
[100] HDO.db_0.99.1 fastDummies_1.7.3 gridExtra_2.3
[103] shape_1.4.6.1 compiler_4.2.3 tibble_3.2.1
[106] shadowtext_0.1.3 KernSmooth_2.23-22 crayon_1.5.2
[109] R.oo_1.26.0 htmltools_0.5.7 ggfun_0.1.4
[112] later_1.3.2 aplot_0.2.2 tidyr_1.3.1
[115] DBI_1.2.2 tweenr_2.0.2 ComplexHeatmap_2.14.0 [118] MASS_7.3-60.0.1 Matrix_1.6-4 cli_3.6.2
[121] R.methodsS3_1.8.2 parallel_4.2.3 dotCall64_1.1-1
[124] igraph_1.5.1 pkgconfig_2.0.3 plotly_4.10.4
[127] spatstat.sparse_3.0-3 foreach_1.5.2 ggtree_3.6.2
[130] bslib_0.6.1 XVector_0.38.0 yulab.utils_0.1.4
[133] digest_0.6.35 sctransform_0.4.1 RcppAnnoy_0.0.22
[136] spatstat.data_3.0-4 Biostrings_2.66.0 fastmatch_1.1-4
[139] rmarkdown_2.26 cellranger_1.1.0 leiden_0.4.3.1
[142] tidytree_0.4.6 uwot_0.1.16 shiny_1.8.0
[145] rjson_0.2.21 lifecycle_1.0.4 nlme_3.1-164
[148] jsonlite_1.8.8 viridisLite_0.4.2 limma_3.54.2
[151] fansi_1.0.6 pillar_1.9.0 lattice_0.22-5
[154] KEGGREST_1.38.0 fastmap_1.1.1 httr_1.4.7
[157] survival_3.5-7 GO.db_3.16.0 glue_1.7.0
[160] remotes_2.4.2.1 png_0.1-8 iterators_1.0.14
[163] bit_4.0.5 sass_0.4.8 ggforce_0.4.1
[166] stringi_1.8.3 blob_1.2.4 RcppHNSW_0.6.0
[169] memoise_2.0.1 ape_5.7-1 irlba_2.3.5.1
[172] future.apply_1.11.1

guidohooiveld commented 3 months ago

If you check the help page of enrichGO (type: ?enrichGO), you will see that the default value for the ontology argument is the Molecular Function sub-category (ont = "MF"). Thus, if you don't specify yourselves the value of ont, then ont=MF.

Yet, by setting ont = "ALL" you will include the terms of all 3 GO sub-categories (also Biological Process [BP] and Cellular Compartment [CC}) in your analysis, and therefore your results do differ.