YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1k stars 252 forks source link

"I performed the gseGO function using clusterProfiler for all ontologies together, but I wasn't able to apply the simplify function to remove redundancy. I tried various options suggested online, but none worked. As a workaround, I ran gseGO for each ontology (BP, MF, CC) separately and then removed redundancy for each ontology using GOsim. Is it correct to combine the results from all ontologies into one file and then combining it with metadata from one ontology (e.g., the output of gseGO for BP) to create enrichment plots?" or what could be the easiest way to perfrom that #707

Open Sidragull57 opened 1 month ago

Sidragull57 commented 1 month ago

Prerequisites

Describe your issue

Ask in the right place

guidohooiveld commented 1 month ago

Next time please properly format your post!

Yet, I cannot reproduce your problem!

> library(clusterProfiler)
> library(org.Hs.eg.db)
> 
> data(geneList, package="DOSE")
> 
> res <- gseGO(geneList     = geneList,
+              OrgDb        = org.Hs.eg.db,
+              ont          = "ALL",
+              eps          = 0,
+              minGSSize    = 15,
+              maxGSSize    = 500,
+              pvalueCutoff = 0.05)
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
> 
> res
#
# Gene Set Enrichment Analysis
#
#...@organism    Homo sapiens 
#...@setType     GOALL 
#...@keytype     ENTREZID 
#...@geneList    Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
 - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
#...nPerm        
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...833 enriched terms found
'data.frame':   833 obs. of  12 variables:
 $ ONTOLOGY       : chr  "BP" "BP" "BP" "BP" ...
 $ ID             : chr  "GO:0098813" "GO:0007059" "GO:0051276" "GO:0000819" ...
 $ Description    : chr  "nuclear chromosome segregation" "chromosome segregation" "chromosome organization" "sister chromatid segregation" ...
 $ setSize        : int  238 319 473 185 152 327 317 362 224 138 ...
 $ enrichmentScore: num  0.633 0.585 0.52 0.661 0.686 ...
 $ NES            : num  2.88 2.72 2.54 2.93 2.96 ...
 $ pvalue         : num  3.41e-30 3.61e-30 1.75e-30 1.39e-27 1.88e-25 ...
 $ p.adjust       : num  7.11e-27 7.11e-27 7.11e-27 2.06e-24 2.22e-22 ...
 $ qvalue         : num  5.48e-27 5.48e-27 5.48e-27 1.58e-24 1.71e-22 ...
 $ rank           : num  449 1374 1374 449 532 ...
 $ leading_edge   : chr  "tags=23%, list=4%, signal=22%" "tags=27%, list=11%, signal=25%" "tags=24%, list=11%, signal=22%" "tags=25%, list=4%, signal=24%" ...
 $ core_enrichment: chr  "55143/991/9493/1062/4605/9133/10403/7153/23397/259266/9787/11065/220134/51203/22974/10460/4751/983/4085/81930/8"| __truncated__ "55143/991/9493/1062/4605/9133/10403/7153/23397/259266/9787/11065/55355/220134/51203/22974/10460/4751/79019/5583"| __truncated__ "8318/55143/991/9493/1062/4605/10403/7153/23397/9787/11065/55355/220134/51203/22974/10460/4751/55839/983/4085/98"| __truncated__ "55143/991/9493/1062/4605/10403/7153/23397/9787/11065/220134/51203/22974/10460/4751/983/4085/81930/81620/332/383"| __truncated__ ...
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> res.simplify <- simplify(res)
> 
> res.simplify
#
# Gene Set Enrichment Analysis
#
#...@organism    Homo sapiens 
#...@setType     GOALL 
#...@keytype     ENTREZID 
#...@geneList    Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
 - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
#...nPerm        
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...326 enriched terms found
'data.frame':   326 obs. of  12 variables:
 $ ONTOLOGY       : chr  "BP" "BP" "BP" "BP" ...
 $ ID             : chr  "GO:0098813" "GO:0007059" "GO:0051276" "GO:0000819" ...
 $ Description    : chr  "nuclear chromosome segregation" "chromosome segregation" "chromosome organization" "sister chromatid segregation" ...
 $ setSize        : int  238 319 473 185 327 491 423 104 197 129 ...
 $ enrichmentScore: num  0.633 0.585 0.52 0.661 0.541 ...
 $ NES            : num  2.88 2.72 2.54 2.93 2.54 ...
 $ pvalue         : num  3.41e-30 3.61e-30 1.75e-30 1.39e-27 1.11e-24 ...
 $ p.adjust       : num  7.11e-27 7.11e-27 7.11e-27 2.06e-24 1.09e-21 ...
 $ qvalue         : num  5.48e-27 5.48e-27 5.48e-27 1.58e-24 8.41e-22 ...
 $ rank           : num  449 1374 1374 449 1246 ...
 $ leading_edge   : chr  "tags=23%, list=4%, signal=22%" "tags=27%, list=11%, signal=25%" "tags=24%, list=11%, signal=22%" "tags=25%, list=4%, signal=24%" ...
 $ core_enrichment: chr  "55143/991/9493/1062/4605/9133/10403/7153/23397/259266/9787/11065/220134/51203/22974/10460/4751/983/4085/81930/8"| __truncated__ "55143/991/9493/1062/4605/9133/10403/7153/23397/259266/9787/11065/55355/220134/51203/22974/10460/4751/79019/5583"| __truncated__ "8318/55143/991/9493/1062/4605/10403/7153/23397/9787/11065/55355/220134/51203/22974/10460/4751/55839/983/4085/98"| __truncated__ "55143/991/9493/1062/4605/10403/7153/23397/9787/11065/220134/51203/22974/10460/4751/983/4085/81930/81620/332/383"| __truncated__ ...
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> packageVersion("clusterProfiler")
[1] ‘4.13.0’
> 
Sidragull57 commented 1 month ago

Sorry for inconvenience, Actually, I wanted to remove redundancy from the result of GSEA-GO results and then want to make an enrichemnt plot and could not a way to do that.

I followed your instruction and perfrom

res <- gseGO(geneList = geneList,

  • OrgDb = org.Mm.eg.db,
  • ont = "ALL",
  • eps = 0,
  • minGSSize = 15,
  • maxGSSize = 500,
  • pvalueCutoff = 0.05) preparing geneSet collections... GSEA analysis... leading edge analysis... done... Es gab 12 Warnungen (Anzeige mit warnings()) res #

    Gene Set Enrichment Analysis

    #

    ...@organism Mus musculus

    ...@setType GOALL

    ...@keytype ENTREZID

    ...@geneList Named num [1:16133] 5.9 4.66 2.89 2.24 2.1 ...

  • attr(*, "names")= chr [1:16133] "26900" "26908" "20592" "20227" ...

    ...nPerm

    ...pvalues adjusted by 'BH' with cutoff <0.05

    ...17 enriched terms found

    'data.frame': 17 obs. of 12 variables: $ ONTOLOGY : chr "MF" "MF" "MF" "BP" ... $ ID : chr "GO:0019787" "GO:0004842" "GO:0061659" "GO:0043161" ... $ Description : chr "ubiquitin-like protein transferase activity" "ubiquitin-protein transferase activity" "ubiquitin-like protein ligase activity" "proteasome-mediated ubiquitin-dependent protein catabolic process" ... $ setSize : int 410 387 307 410 294 483 127 202 116 171 ... $ enrichmentScore: num -0.349 -0.346 -0.343 -0.317 -0.335 ... $ NES : num -1.89 -1.86 -1.8 -1.71 -1.75 ... $ pvalue : num 1.31e-09 4.89e-09 4.59e-07 5.06e-07 2.13e-06 ... $ p.adjust : num 8.37e-06 1.57e-05 8.11e-04 8.11e-04 2.73e-03 ... $ qvalue : num 7.78e-06 1.45e-05 7.53e-04 7.53e-04 2.54e-03 ... $ rank : num 3438 3438 3474 3723 3474 ... $ leading_edge : chr "tags=34%, list=21%, signal=28%" "tags=34%, list=21%, signal=27%" "tags=36%, list=22%, signal=29%" "tags=34%, list=23%, signal=27%" ... $ core_enrichment: chr "67138/235315/19822/68795/56715/67338/208650/77853/547109/59003/209462/242521/217333/54484/80751/67455/74132/170"| truncated "67138/235315/19822/68795/56715/67338/208650/77853/547109/59003/209462/242521/217333/54484/80751/67455/74132/280"| truncated "78889/67138/19822/68795/56715/67338/208650/77853/547109/59003/209462/54484/80751/74132/28077/22215/53323/672511"| truncated "19173/77891/11651/104318/448987/23805/232566/233040/14198/234684/93687/79043/71765/76375/12387/226144/19822/687"| truncated ...

    ...Citation

    T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141

res.simplify <- simplify(res)

Error in as.list.default(X) : No method to convert this S4 class into a vector

Kindly tell me a way to remove redundancy from GO results and then plot it as enrichment plot

guidohooiveld commented 1 month ago

Again, it is working for me....! Does the exact same code from me also work for you? (copy/paste it)

Note; for your specific dataset:

> library(clusterProfiler)
> library(org.Hs.eg.db)
> 
> data(geneList, package="DOSE")
> 
> res <- gseGO(geneList     = geneList,
+              OrgDb        = org.Hs.eg.db,
+              ont          = "ALL",
+              eps          = 0,
+              minGSSize    = 15,
+              maxGSSize    = 500,
+              pvalueCutoff = 0.05)
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
> 
> res
#
# Gene Set Enrichment Analysis
#
#...@organism    Homo sapiens 
#...@setType     GOALL 
#...@keytype     ENTREZID 
#...@geneList    Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
 - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
#...nPerm        
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...889 enriched terms found
'data.frame':   889 obs. of  12 variables:
 $ ONTOLOGY       : chr  "BP" "BP" "BP" "BP" ...
 $ ID             : chr  "GO:0007059" "GO:0051276" "GO:0098813" "GO:0000819" ...
 $ Description    : chr  "chromosome segregation" "chromosome organization" "nuclear chromosome segregation" "sister chromatid segregation" ...
 $ setSize        : int  319 473 238 185 152 327 317 138 362 224 ...
 $ enrichmentScore: num  0.585 0.52 0.633 0.661 0.686 ...
 $ NES            : num  2.75 2.52 2.88 2.92 2.94 ...
 $ pvalue         : num  1.43e-31 1.24e-30 2.46e-30 2.26e-27 6.34e-26 ...
 $ p.adjust       : num  8.47e-28 3.67e-27 4.85e-27 3.34e-24 7.50e-23 ...
 $ qvalue         : num  6.39e-28 2.77e-27 3.65e-27 2.52e-24 5.66e-23 ...
 $ rank           : num  1374 1374 449 449 532 ...
 $ leading_edge   : chr  "tags=27%, list=11%, signal=25%" "tags=24%, list=11%, signal=22%" "tags=23%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ...
 $ core_enrichment: chr  "55143/991/9493/1062/4605/9133/10403/7153/23397/259266/9787/11065/55355/220134/51203/22974/10460/4751/79019/5583"| __truncated__ "8318/55143/991/9493/1062/4605/10403/7153/23397/9787/11065/55355/220134/51203/22974/10460/4751/55839/983/4085/98"| __truncated__ "55143/991/9493/1062/4605/9133/10403/7153/23397/259266/9787/11065/220134/51203/22974/10460/4751/983/4085/81930/8"| __truncated__ "55143/991/9493/1062/4605/10403/7153/23397/9787/11065/220134/51203/22974/10460/4751/983/4085/81930/81620/332/383"| __truncated__ ...
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> res <- setReadable(res, OrgDb = org.Hs.eg.db, keyType="ENTREZID")
> 
> res.simplify <- simplify(res)
> 
> res.simplify
#
# Gene Set Enrichment Analysis
#
#...@organism    Homo sapiens 
#...@setType     GOALL 
#...@keytype     ENTREZID 
#...@geneList    Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
 - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
#...nPerm        
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...345 enriched terms found
'data.frame':   345 obs. of  12 variables:
 $ ONTOLOGY       : chr  "BP" "BP" "BP" "BP" ...
 $ ID             : chr  "GO:0007059" "GO:0051276" "GO:0098813" "GO:0000819" ...
 $ Description    : chr  "chromosome segregation" "chromosome organization" "nuclear chromosome segregation" "sister chromatid segregation" ...
 $ setSize        : int  319 473 238 185 327 491 423 197 104 129 ...
 $ enrichmentScore: num  0.585 0.52 0.633 0.661 0.541 ...
 $ NES            : num  2.75 2.52 2.88 2.92 2.54 ...
 $ pvalue         : num  1.43e-31 1.24e-30 2.46e-30 2.26e-27 1.20e-24 ...
 $ p.adjust       : num  8.47e-28 3.67e-27 4.85e-27 3.34e-24 1.18e-21 ...
 $ qvalue         : num  6.39e-28 2.77e-27 3.65e-27 2.52e-24 8.89e-22 ...
 $ rank           : num  1374 1374 449 449 1246 ...
 $ leading_edge   : chr  "tags=27%, list=11%, signal=25%" "tags=24%, list=11%, signal=22%" "tags=23%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ...
 $ core_enrichment: chr  "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPM"| __truncated__ "CDC45/CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPN/CDK1"| __truncated__ "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1"| __truncated__ "CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/CDT"| __truncated__ ...
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> dotplot(res.simplify)
> 
> cnetplot(res.simplify)
Warning message:
ggrepel: 13 unlabeled data points (too many overlaps). Consider increasing max.overlaps 
> 
> 

dotplot: image

cnetplot: image

Sidragull57 commented 1 month ago

Thank you very much .

now it worked for me

but if I applied it on Comaprecluster goGSEA, i got error message.

ck_TP_VR_GO # output of compare cluster gseaGO #

Result of Comparing 4 gene clusters

#

.. @fun gseGO

.. @geneClusters List of 4

$ d1 : Named num [1:16225] 1.12 1.08 1.07 1 1 ... ..- attr(, "names")= chr [1:16225] "70325" "14450" "14693" "22351" ... $ h12: Named num [1:16139] 1.93 1.93 1.67 1.38 1.32 ... ..- attr(, "names")= chr [1:16139] "14605" "74747" "14229" "73825" ... $ d5 : Named num [1:16082] 1.251 1.088 1.052 0.991 0.987 ... ..- attr(, "names")= chr [1:16082] "74477" "14632" "207259" "12724" ... $ d10: Named num [1:16127] 1.196 1.025 1.018 0.999 0.996 ... ..- attr(, "names")= chr [1:16127] "229004" "74007" "68527" "228846" ...

...Result 'data.frame': 287 obs. of 13 variables:

$ Cluster : Factor w/ 4 levels "d1","h12","d5",..: 1 1 1 1 1 1 1 1 1 1 ... $ ONTOLOGY : chr "CC" "BP" "BP" "CC" ... $ ID : chr "GO:0022626" "GO:0002181" "GO:0006397" "GO:0005681" ... $ Description : chr "cytosolic ribosome" "cytoplasmic translation" "mRNA processing" "spliceosomal complex" ... $ setSize : int 101 140 449 189 230 405 321 16 218 248 ... $ enrichmentScore: num -0.474 -0.437 -0.296 -0.375 -0.349 ... $ NES : num -2.21 -2.16 -1.69 -1.93 -1.84 ... $ pvalue : num 3.15e-08 1.70e-08 1.05e-07 2.38e-07 4.63e-07 ... $ p.adjust : num 0.000128 0.000128 0.000286 0.000486 0.000755 ... $ qvalue : num 0.000125 0.000125 0.000279 0.000474 0.000736 ... $ rank : num 5455 3967 4093 2916 2789 ... $ leading_edge : chr "tags=66%, list=34%, signal=44%" "tags=44%, list=24%, signal=34%" "tags=35%, list=25%, signal=27%" "tags=42%, list=18%, signal=35%" ... $ core_enrichment: chr "20005/19921/20084/19989/78294/20088/54217/319195/19941/19951/11837/76808/267019/22187/100043787/19946/27176/619"| truncated "20055/56040/67427/98221/20068/13629/27370/116905/100503670/67115/20054/208922/433702/19981/16898/27207/75617/67"| truncated "192170/70465/65105/18747/19655/68955/54614/78688/230257/16549/330216/67040/107686/231769/20624/83701/66899/6649"| truncated "67178/66354/19134/192170/19655/54614/107686/20624/66492/74200/76479/68011/24010/19704/230596/20227/68592/192160"| truncated ...

.. number of enriched terms found for each gene cluster:

.. d1: 41

.. h12: 0

.. d5: 119

.. d10: 127

#

...Citation

T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141

res_cC <- setReadable(ck_TP_VR_GO, OrgDb = org.Mm.eg.db, keyType="ENTREZID") res.simplify_CC <- simplify(res_cC)

Fehler in match.arg(ont, c("BP", "CC", "MF")) : 'arg' should be one of “BP”, “CC”, “MF”

guidohooiveld commented 1 month ago

Once more, that is working for me as well!

> library(clusterProfiler)
> library(enrichplot)
> library(org.Hs.eg.db)
> 
> ## load sample data
> data(geneList, package="DOSE")
> head(geneList)
    4312     8318    10874    55143    55388      991 
4.572613 4.514594 4.418218 4.144075 3.876258 3.677857 
>  
> ## using sample data, create list with 3 comparisons to be used as input for comparCluster
> ## note that 'List3' is the reverse of 'List1' and 'List2'.
> inputList <- list(List1 = geneList, List2 = geneList, List3 = sort(-1*geneList, decreasing = TRUE) )
> 
> ## run gseGO on all input genelists
> res.combined <- compareCluster(geneClusters=inputList,
+                               fun = "gseGO",
+                               OrgDb = org.Hs.eg.db,
+                               keyType = "ENTREZID",
+                               ont = "ALL",
+                               eps = 0,
+                               pvalueCutoff = 0.05,
+                               pAdjustMethod = "BH",
+                               minGSSize = 15,
+                               maxGSSize = 500)
> 
> ## convert entrezids into symbols
> res.combined <- setReadable(res.combined, OrgDb = org.Hs.eg.db, keyType="ENTREZID")
> 
> ## simplify
> res.combined.simplify <- simplify(res.combined)
> res.combined.simplify
#
# Result of Comparing 3 gene clusters 
#
#.. @fun         gseGO 
#.. @geneClusters       List of 3
 $ List1: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
  ..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
 $ List2: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
  ..- attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
 $ List3: Named num [1:12495] 4.3 3.95 3.6 3.46 3.42 ...
  ..- attr(*, "names")= chr [1:12495] "4969" "57758" "79901" "79838" ...
#...Result      'data.frame':   1004 obs. of  13 variables:
 $ Cluster        : Factor w/ 3 levels "List1","List2",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ONTOLOGY       : chr  "BP" "BP" "BP" "BP" ...
 $ ID             : chr  "GO:0098813" "GO:0007059" "GO:0051276" "GO:0000819" ...
 $ Description    : chr  "nuclear chromosome segregation" "chromosome segregation" "chromosome organization" "sister chromatid segregation" ...
 $ setSize        : int  238 319 473 185 327 491 423 104 197 129 ...
 $ enrichmentScore: num  0.633 0.585 0.52 0.661 0.541 ...
 $ NES            : num  2.91 2.78 2.59 2.95 2.56 ...
 $ pvalue         : num  9.67e-31 7.56e-31 4.48e-31 8.94e-27 8.44e-25 ...
 $ p.adjust       : num  1.91e-27 1.91e-27 1.91e-27 1.32e-23 8.32e-22 ...
 $ qvalue         : num  1.46e-27 1.46e-27 1.46e-27 1.01e-23 6.37e-22 ...
 $ rank           : num  449 1374 1374 449 1246 ...
 $ leading_edge   : chr  "tags=23%, list=4%, signal=22%" "tags=27%, list=11%, signal=25%" "tags=24%, list=11%, signal=22%" "tags=25%, list=4%, signal=24%" ...
 $ core_enrichment: chr  "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1"| __truncated__ "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPM"| __truncated__ "CDC45/CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPN/CDK1"| __truncated__ "CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/CDT"| __truncated__ ...
#.. number of enriched terms found for each gene cluster:
#..   List1: 342 
#..   List2: 336 
#..   List3: 326 
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, 
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. 
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. 
The Innovation. 2021, 2(3):100141 

> 
> 
> ## dotplot
> dotplot(res.combined.simplify, font.size=8, showCategory=8, title =("GSEA results"), split=".sign") + facet_grid(.~.sign)
> 
> ## cnetplot
> cnetplot(res.combined.simplify)
>

image

image

Sidragull57 commented 1 month ago

I am using R version 4.3.3. Initially, I unloaded the clusterProfiler package and then reinstalled it. This approach made the simplify function work for me.

Unload the clusterProfiler package if it is loaded

if ("package:clusterProfiler" %in% search()) { detach("package:clusterProfiler", unload = TRUE) }

Update all outdated packages

update.packages(ask = FALSE)

Install BiocManager if not already installed

if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") }

Reinstall the latest version of clusterProfiler

BiocManager::install("clusterProfiler", ask = FALSE)

I got this code from ChatGPT

However, each time before running simplify, I have to unload and reinstall clusterProfiler to get it to work again. I am not sure what is causing this issue.

I re-run everything

library(clusterProfiler) library(org.Mm.eg.db) library(org.Hs.eg.db) data(geneList, package="DOSE")

set.seed(1)

res <- gseGO(gene = geneList_1d_fgsea_VR,

  • OrgDb = org.Mm.eg.db,
  • ont = "All", # All
  • pAdjustMethod = "fdr",
  • pvalueCutoff = 0.05,
  • exponent = 1,
  • minGSSize = 10,
  • maxGSSize = 500,
  • eps = 0,
  • verbose = TRUE,
  • seed = TRUE) preparing geneSet collections... GSEA analysis... leading edge analysis... done... Es gab 12 Warnungen (Anzeige mit warnings()) res <- setReadable(res, OrgDb = org.Mm.eg.db, keyType="ENTREZID") res.simplify <- simplify(res) res.simplify #

    Gene Set Enrichment Analysis

    #

    ...@organism Mus musculus

    ...@setType GOALL

    ...@keytype ENTREZID

    ...@geneList Named num [1:16225] 1.12 1.08 1.07 1 1 ...

  • attr(*, "names")= chr [1:16225] "70325" "14450" "14693" "22351" ...

    ...nPerm

    ...pvalues adjusted by 'fdr' with cutoff <0.05

    ...31 enriched terms found

    'data.frame': 31 obs. of 12 variables: $ ONTOLOGY : chr "CC" "CC" "CC" "CC" ... $ ID : chr "GO:0022626" "GO:0005681" "GO:0036464" "GO:0016607" ... $ Description : chr "cytosolic ribosome" "spliceosomal complex" "cytoplasmic ribonucleoprotein granule" "nuclear speck" ... $ setSize : int 101 189 230 338 362 124 25 95 225 140 ... $ enrichmentScore: num -0.474 -0.375 -0.349 -0.299 0.327 ... $ NES : num -2.21 -1.93 -1.84 -1.66 1.68 ... $ pvalue : num 3.15e-08 2.38e-07 4.63e-07 2.22e-06 5.47e-06 ... $ p.adjust : num 0.000128 0.000486 0.000755 0.00212 0.002627 ... $ qvalue : num 0.000125 0.000474 0.000736 0.002068 0.002563 ... $ rank : num 5455 2916 2789 3252 3696 ... $ leading_edge : chr "tags=66%, list=34%, signal=44%" "tags=42%, list=18%, signal=35%" "tags=28%, list=17%, signal=24%" "tags=29%, list=20%, signal=24%" ... $ core_enrichment: chr "Rpl9/Rpl19/Rps18/Rpl7/Rps27a/Rps24/Rpl36/Rpl17/Rpl26/Rpl32/Rplp0/Rpl18a/Rps15a/Ubb/Rpl36a-ps1/Rpl30/Rpl7a/Zcchc"| truncated "Zmat5/Snw1/Prpf4b/Eif4a3/Rbmx/Prpf40b/Snrpd2/Eftud2/Zmat2/Khdc4/Smndc1/Snrpg/Ik/Upf1/Prpf38a/Sart1/Syf2/Casc3/A"| truncated "Larp4/Hnrnpu/Rps4x/Xrn1/Rpl28/Ago2/Snrpb2/Rbm20/Mapt/Eif4e/Tnrc6c/Cnot7/Lsm3/Polr2d/Larp1/Grb7/Sqstm1/Dhx36/Ddx"| truncated "Cxxc1/Tcim/Topors/Syf2/Phf7/Aagab/Hipk1/Api5/Prcc/Sgk1/Prpf19/Trip12/Hdac4/Dnaaf1/Srsf3/Kat6a/Cdk12/Hnrnpu/Srpk"| truncated ...

    ...Citation

    T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141

But it did not work for my comparecluster data and I got

Fehler in match.arg(ont, c("BP", "CC", "MF")) : 'arg' should be one of “BP”, “CC”, “MF”

but now I have performed the whole analysis using the data you have mentioned

data(geneList, package="DOSE")

res_Dose <- gseGO(gene = geneList,

  • OrgDb = org.Hs.eg.db,
  • ont = "All", # All
  • pAdjustMethod = "fdr",
  • pvalueCutoff = 0.05,
  • exponent = 1,
  • minGSSize = 10,
  • maxGSSize = 500,
  • eps = 0,
  • verbose = TRUE,
  • seed = TRUE) preparing geneSet collections... GSEA analysis... leading edge analysis... done... Warnmeldungen: 1: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 2: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 3: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 4: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 5: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 6: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 7: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 8: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 9: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 10: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens

res_Dose #

Gene Set Enrichment Analysis

#

...@organism Homo sapiens

...@setType GOALL

...@keytype ENTREZID

...@geneList Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...

  • attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...

    ...nPerm

    ...pvalues adjusted by 'fdr' with cutoff <0.05

    ...868 enriched terms found

    'data.frame': 868 obs. of 12 variables: $ ONTOLOGY : chr "BP" "BP" "BP" "BP" ... $ ID : chr "GO:0051276" "GO:0007059" "GO:0098813" "GO:0000819" ... $ Description : chr "chromosome organization" "chromosome segregation" "nuclear chromosome segregation" "sister chromatid segregation" ... $ setSize : int 471 319 249 198 332 308 160 139 367 486 ... $ enrichmentScore: num 0.523 0.572 0.609 0.633 0.534 ... $ NES : num 2.57 2.7 2.79 2.84 2.53 ... $ pvalue : num 2.20e-31 1.24e-28 3.49e-27 1.31e-24 1.09e-23 ... $ p.adjust : num 1.68e-27 4.73e-25 8.88e-24 2.50e-21 1.66e-20 ... $ qvalue : num 1.34e-27 3.78e-25 7.09e-24 2.00e-21 1.33e-20 ... $ rank : num 1374 1374 449 449 1246 ... $ leading_edge : chr "tags=24%, list=11%, signal=22%" "tags=26%, list=11%, signal=24%" "tags=21%, list=4%, signal=21%" "tags=23%, list=4%, signal=22%" ... $ core_enrichment: chr "CDC45/CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/HJURP/NUSAP1/TPX2/TACC3/NEK2/CENPN/CDK1/MAD2"| truncated "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPN"| truncated "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/NUSAP1/TPX2/TACC3/NEK2/MAD2L1/KIF18A/CD"| truncated "CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/NUSAP1/TPX2/TACC3/NEK2/MAD2L1/KIF18A/CDT1/BIRC5/KI"| truncated ...

    ...Citation

    T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141

res_Dose <- setReadable(res_Dose, OrgDb = org.Hs.eg.db, keyType="ENTREZID") res.Dose.simplify <- simplify(res_Dose) Fehler in match.arg(ont, c("BP", "CC", "MF")) : 'arg' should be one of “BP”, “CC”, “MF”

same error when I perormed comaprecluster with the data

res.combined <- compareCluster(geneClusters=inputList, fun = "gseGO",

  • OrgDb = org.Hs.eg.db,
  • keyType = "ENTREZID",
  • ont = "ALL",
  • eps = 0,
  • pvalueCutoff = 0.05,
  • pAdjustMethod = "BH",
  • minGSSize = 15,
  • maxGSSize = 500) Es gab 30 Warnungen (Anzeige mit warnings()) res.combined <- setReadable(res.combined, OrgDb = org.Hs.eg.db, keyType="ENTREZID") res.combined.simplify <- simplify(res.combined) Fehler in match.arg(ont, c("BP", "CC", "MF")) : 'arg' should be one of “BP”, “CC”, “MF”

I am extremly sorry but i did not know what is wrong.

guidohooiveld commented 1 month ago

Some thoughts:

Since you got he error when analyzing the included human dataset, and I do not, it points to an issue with your R/Bioconductor installation.

Also, are you running it on a laptop/PC with R-studio, or on a computer cluster? Reason I am asking is because of the warning regarding the stats package. Pointing to R-studio because of https://forum.posit.co/t/error-when-running-parallelized-process-warning-in-serialize-package-stats-may-not-be-available-when-loading/110573

Therefore:

Also:

Sidragull57 commented 1 month ago

I updated R from version 4.3 to 4.4.1 and updated all the packages, but the problem persists for ontology = "ALL". If I change to only one specific ontology like "BP", it works. Below, you can see the details.

BiocManager::valid() 'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories", package = "BiocManager")' for details. Replacement repositories: CRAN: https://cran.rstudio.com/

R version 4.4.1 (2024-06-14 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8 LC_MONETARY=German_Germany.utf8 [4] LC_NUMERIC=C LC_TIME=German_Germany.utf8

time zone: Europe/Berlin tzcode source: internal

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] org.Hs.eg.db_3.19.1 msigdbr_7.5.1.9001 org.Mm.eg.db_3.19.1 AnnotationDbi_1.66.0
[5] IRanges_2.38.1 S4Vectors_0.42.1 Biobase_2.64.0 BiocGenerics_0.50.0
[9] clusterProfiler_4.12.0 dplyr_1.1.4 igraph_2.0.3

loaded via a namespace (and not attached): [1] DBI_1.2.3 shadowtext_0.1.4 gson_0.1.0 gridExtra_2.3
[5] remotes_2.5.0 rlang_1.1.4 magrittr_2.0.3 DOSE_3.30.1
[9] compiler_4.4.1 RSQLite_2.3.7 png_0.1-8 vctrs_0.6.5
[13] reshape2_1.4.4 stringr_1.5.1 pkgconfig_2.0.3 crayon_1.5.3
[17] fastmap_1.2.0 XVector_0.44.0 ggraph_2.2.1 utf8_1.2.4
[21] HDO.db_0.99.1 enrichplot_1.24.0 UCSC.utils_1.0.0 purrr_1.0.2
[25] bit_4.0.5 zlibbioc_1.50.0 cachem_1.1.0 aplot_0.2.3
[29] GenomeInfoDb_1.40.1 jsonlite_1.8.8 blob_1.2.4 BiocParallel_1.38.0
[33] tweenr_2.0.3 parallel_4.4.1 R6_2.5.1 stringi_1.8.4
[37] RColorBrewer_1.1-3 GOSemSim_2.30.0 Rcpp_1.0.12 snow_0.4-4
[41] Matrix_1.7-0 splines_4.4.1 tidyselect_1.2.1 qvalue_2.36.0
[45] viridis_0.6.5 codetools_0.2-20 curl_5.2.1 lattice_0.22-6
[49] tibble_3.2.1 plyr_1.8.9 treeio_1.28.0 withr_3.0.0
[53] KEGGREST_1.44.1 gridGraphics_0.5-1 scatterpie_0.2.3 polyclip_1.10-6
[57] Biostrings_2.72.1 BiocManager_1.30.23 pillar_1.9.0 ggtree_3.12.0
[61] ggfun_0.1.5 generics_0.1.3 ggplot2_3.5.1 munsell_0.5.1
[65] scales_1.3.0 tidytree_0.4.6 glue_1.7.0 lazyeval_0.2.2
[69] tools_4.4.1 ggnewscale_0.4.10 data.table_1.15.4 fgsea_1.30.0
[73] babelgene_22.9 fs_1.6.4 graphlayouts_1.1.1 fastmatch_1.1-4
[77] tidygraph_1.3.1 cowplot_1.1.3 grid_4.4.1 tidyr_1.3.1
[81] ape_5.8 colorspace_2.1-0 nlme_3.1-165 GenomeInfoDbData_1.2.12 [85] patchwork_1.2.0 ggforce_0.4.2 cli_3.6.3 fansi_1.0.6
[89] viridisLite_0.4.2 gtable_0.3.5 yulab.utils_0.1.4 digest_0.6.36
[93] ggrepel_0.9.5 ggplotify_0.1.2 farver_2.1.2 memoise_2.0.1
[97] lifecycle_1.0.4 httr_1.4.7 GO.db_3.19.1 bit64_4.0.5
[101] MASS_7.3-61

Bioconductor version '3.19'

create a valid installation with

BiocManager::install(c( "msigdbr", "Rcpp" ), update = TRUE, ask = FALSE, force = TRUE)

more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date

Warnmeldung: 1 packages out-of-date; 1 packages too new

library(clusterProfiler) library(org.Hs.eg.db) data(geneList, package="DOSE") res <- gseGO(geneList = geneList,

  • OrgDb = org.Hs.eg.db,
  • ont = "ALL",
  • eps = 0,
  • minGSSize = 15,
  • maxGSSize = 500,
  • pvalueCutoff = 0.05) using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections... GSEA analysis... leading edge analysis... done... Warnmeldungen: 1: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 2: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 3: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 4: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 5: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 6: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 7: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 8: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 9: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 10: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens

res <- setReadable(res, OrgDb = org.Hs.eg.db, keyType="ENTREZID") res.simplify <- simplify(res) Fehler in match.arg(ont, c("BP", "CC", "MF")) : 'arg' sollte eines von '“BP”, “CC”, “MF”' sein res_BP <- gseGO(geneList = geneList,

  • OrgDb = org.Hs.eg.db,
  • ont = "BP",
  • eps = 0,
  • minGSSize = 15,
  • maxGSSize = 500,
  • pvalueCutoff = 0.05) using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections... GSEA analysis... leading edge analysis... done... Warnmeldungen: 1: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 2: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 3: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 4: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 5: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 6: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 7: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 8: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 9: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens 10: In serialize(data, node$con) : 'package:stats' evtl. nicht verfügbar während des Ladens

res_BP <- setReadable(res_BP, OrgDb = org.Hs.eg.db, keyType="ENTREZID") res_BP.simplify <- simplify(res_BP) res_BP #

Gene Set Enrichment Analysis

#

...@organism Homo sapiens

...@setType BP

...@keytype ENTREZID

...@geneList Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...

  • attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...

    ...nPerm

    ...pvalues adjusted by 'BH' with cutoff <0.05

    ...681 enriched terms found

    'data.frame': 681 obs. of 11 variables: $ ID : chr "GO:0007059" "GO:0051276" "GO:0098813" "GO:0000819" ... $ Description : chr "chromosome segregation" "chromosome organization" "nuclear chromosome segregation" "sister chromatid segregation" ... $ setSize : int 319 473 238 185 152 327 224 362 491 423 ... $ enrichmentScore: num 0.585 0.52 0.633 0.661 0.686 ... $ NES : num 2.76 2.58 2.91 2.94 2.97 ... $ pvalue : num 4.55e-31 4.52e-31 8.49e-30 2.12e-27 1.59e-25 ... $ p.adjust : num 1.04e-27 1.04e-27 1.30e-26 2.43e-24 1.46e-22 ... $ qvalue : num 7.90e-28 7.90e-28 9.83e-27 1.84e-24 1.10e-22 ... $ rank : num 1374 1374 449 449 532 ... $ leading_edge : chr "tags=27%, list=11%, signal=25%" "tags=24%, list=11%, signal=22%" "tags=23%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ... $ core_enrichment: chr "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPM"| truncated "CDC45/CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPN/CDK1"| truncated "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1"| truncated "CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/CDT"| truncated ...

    ...Citation

    T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141

res_BP.simplify #

Gene Set Enrichment Analysis

#

...@organism Homo sapiens

...@setType BP

...@keytype ENTREZID

...@geneList Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...

  • attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...

    ...nPerm

    ...pvalues adjusted by 'BH' with cutoff <0.05

    ...249 enriched terms found

    'data.frame': 249 obs. of 11 variables: $ ID : chr "GO:0007059" "GO:0051276" "GO:0098813" "GO:0000819" ... $ Description : chr "chromosome segregation" "chromosome organization" "nuclear chromosome segregation" "sister chromatid segregation" ... $ setSize : int 319 473 238 185 327 491 423 197 104 129 ... $ enrichmentScore: num 0.585 0.52 0.633 0.661 0.541 ... $ NES : num 2.76 2.58 2.91 2.94 2.56 ... $ pvalue : num 4.55e-31 4.52e-31 8.49e-30 2.12e-27 2.75e-24 ... $ p.adjust : num 1.04e-27 1.04e-27 1.30e-26 2.43e-24 2.10e-21 ... $ qvalue : num 7.90e-28 7.90e-28 9.83e-27 1.84e-24 1.59e-21 ... $ rank : num 1374 1374 449 449 1246 ... $ leading_edge : chr "tags=27%, list=11%, signal=25%" "tags=24%, list=11%, signal=22%" "tags=23%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ... $ core_enrichment: chr "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPM"| truncated "CDC45/CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPN/CDK1"| truncated "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1"| truncated "CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/CDT"| truncated ...

    ...Citation

    T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141

inputList <- list(List1 = geneList, List2 = geneList, List3 = sort(-1*geneList, decreasing = TRUE) ) res.combined <- compareCluster(geneClusters=inputList,

  • fun = "gseGO",
  • OrgDb = org.Hs.eg.db,
  • keyType = "ENTREZID",
  • ont = "ALL",
  • eps = 0,
  • pvalueCutoff = 0.05,
  • pAdjustMethod = "BH",
  • minGSSize = 15,
  • maxGSSize = 500) Es gab 30 Warnungen (Anzeige mit warnings()) res.combined #

    Result of Comparing 3 gene clusters

    #

    .. @fun gseGO

    .. @geneClusters List of 3

    $ List1: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... ..- attr(, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... $ List2: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... ..- attr(, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... $ List3: Named num [1:12495] 4.3 3.95 3.6 3.46 3.42 ... ..- attr(*, "names")= chr [1:12495] "4969" "57758" "79901" "79838" ...

    ...Result 'data.frame': 2657 obs. of 13 variables:

    $ Cluster : Factor w/ 3 levels "List1","List2",..: 1 1 1 1 1 1 1 1 1 1 ... $ ONTOLOGY : chr "BP" "BP" "BP" "BP" ... $ ID : chr "GO:0051276" "GO:0007059" "GO:0098813" "GO:0000819" ... $ Description : chr "chromosome organization" "chromosome segregation" "nuclear chromosome segregation" "sister chromatid segregation" ... $ setSize : int 473 319 238 185 152 327 317 138 224 362 ... $ enrichmentScore: num 0.52 0.585 0.633 0.661 0.686 ... $ NES : num 2.56 2.76 2.9 2.96 2.93 ... $ pvalue : num 3.42e-31 1.05e-30 5.32e-30 5.45e-27 7.08e-26 ... $ p.adjust : num 2.02e-27 3.09e-27 1.05e-26 8.06e-24 8.37e-23 ... $ qvalue : num 1.55e-27 2.37e-27 8.03e-27 6.17e-24 6.40e-23 ... $ rank : num 1374 1374 449 449 532 ... $ leading_edge : chr "tags=24%, list=11%, signal=22%" "tags=27%, list=11%, signal=25%" "tags=23%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ... $ core_enrichment: chr "8318/55143/991/9493/1062/4605/10403/7153/23397/9787/11065/55355/220134/51203/22974/10460/4751/55839/983/4085/98"| truncated "55143/991/9493/1062/4605/9133/10403/7153/23397/259266/9787/11065/55355/220134/51203/22974/10460/4751/79019/5583"| truncated "55143/991/9493/1062/4605/9133/10403/7153/23397/259266/9787/11065/220134/51203/22974/10460/4751/983/4085/81930/8"| truncated "55143/991/9493/1062/4605/10403/7153/23397/9787/11065/220134/51203/22974/10460/4751/983/4085/81930/81620/332/383"| truncated ...

    .. number of enriched terms found for each gene cluster:

    .. List1: 869

    .. List2: 846

    .. List3: 942

    #

    ...Citation

    T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141

    res.combined <- setReadable(res.combined, OrgDb = org.Hs.eg.db, keyType="ENTREZID") res.combined.simplify <- simplify(res.combined) Fehler in match.arg(ont, c("BP", "CC", "MF")) : 'arg' sollte eines von '“BP”, “CC”, “MF”' sein res_BP.combined <- compareCluster(geneClusters=inputList,

  • fun = "gseGO",
  • OrgDb = org.Hs.eg.db,
  • keyType = "ENTREZID",
  • ont = "BP",
  • eps = 0,
  • pvalueCutoff = 0.05,
  • pAdjustMethod = "BH",
  • minGSSize = 15,
  • maxGSSize = 500) Es gab 30 Warnungen (Anzeige mit warnings()) res_BP.combined <- setReadable(res_BP.combined, OrgDb = org.Hs.eg.db, keyType="ENTREZID") res_BP.combined #

    Result of Comparing 3 gene clusters

    #

    .. @fun gseGO

    .. @geneClusters List of 3

    $ List1: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... ..- attr(, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... $ List2: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... ..- attr(, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... $ List3: Named num [1:12495] 4.3 3.95 3.6 3.46 3.42 ... ..- attr(*, "names")= chr [1:12495] "4969" "57758" "79901" "79838" ...

    ...Result 'data.frame': 2275 obs. of 12 variables:

    $ Cluster : Factor w/ 3 levels "List1","List2",..: 1 1 1 1 1 1 1 1 1 1 ... $ ID : chr "GO:0007059" "GO:0051276" "GO:0098813" "GO:0000819" ... $ Description : chr "chromosome segregation" "chromosome organization" "nuclear chromosome segregation" "sister chromatid segregation" ... $ setSize : int 319 473 238 185 152 327 224 362 491 423 ... $ enrichmentScore: num 0.585 0.52 0.633 0.661 0.686 ... $ NES : num 2.75 2.53 2.91 2.97 3.01 ... $ pvalue : num 4.55e-31 7.56e-31 1.38e-30 1.60e-27 6.91e-26 ... $ p.adjust : num 1.73e-27 1.73e-27 2.11e-27 1.83e-24 6.33e-23 ... $ qvalue : num 1.29e-27 1.29e-27 1.57e-27 1.37e-24 4.73e-23 ... $ rank : num 1374 1374 449 449 532 ... $ leading_edge : chr "tags=27%, list=11%, signal=25%" "tags=24%, list=11%, signal=22%" "tags=23%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ... $ core_enrichment: chr "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPM"| truncated "CDC45/CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPN/CDK1"| truncated "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1"| truncated "CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/CDT"| truncated ...

    .. number of enriched terms found for each gene cluster:

    .. List1: 790

    .. List2: 760

    .. List3: 725

    #

    ...Citation

    T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141

res_BP.combined.simplify <- simplify(res_BP.combined) res_BP.combined.simplify #

Result of Comparing 3 gene clusters

#

.. @fun gseGO

.. @geneClusters List of 3

$ List1: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... ..- attr(, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... $ List2: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... ..- attr(, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... $ List3: Named num [1:12495] 4.3 3.95 3.6 3.46 3.42 ... ..- attr(*, "names")= chr [1:12495] "4969" "57758" "79901" "79838" ...

...Result 'data.frame': 820 obs. of 12 variables:

$ Cluster : Factor w/ 3 levels "List1","List2",..: 1 1 1 1 1 1 1 1 1 1 ... $ ID : chr "GO:0007059" "GO:0051276" "GO:0098813" "GO:0000819" ... $ Description : chr "chromosome segregation" "chromosome organization" "nuclear chromosome segregation" "sister chromatid segregation" ... $ setSize : int 319 473 238 185 327 491 423 197 104 129 ... $ enrichmentScore: num 0.585 0.52 0.633 0.661 0.541 ... $ NES : num 2.75 2.53 2.91 2.97 2.55 ... $ pvalue : num 4.55e-31 7.56e-31 1.38e-30 1.60e-27 1.53e-24 ... $ p.adjust : num 1.73e-27 1.73e-27 2.11e-27 1.83e-24 1.16e-21 ... $ qvalue : num 1.29e-27 1.29e-27 1.57e-27 1.37e-24 8.70e-22 ... $ rank : num 1374 1374 449 449 1246 ... $ leading_edge : chr "tags=27%, list=11%, signal=25%" "tags=24%, list=11%, signal=22%" "tags=23%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ... $ core_enrichment: chr "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPM"| truncated "CDC45/CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPN/CDK1"| truncated "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1"| truncated "CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/CDT"| truncated ...

.. number of enriched terms found for each gene cluster:

.. List1: 282

.. List2: 274

.. List3: 264

#

...Citation

T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141

res.combined.simplify <- clusterProfiler::simplify(res.combined) Fehler in match.arg(ont, c("BP", "CC", "MF")) : 'arg' sollte eines von '“BP”, “CC”, “MF”' sein res_BP.combined.simplify <- clusterProfiler::simplify(res_BP.combined) res_BP.combined.simplify #

Result of Comparing 3 gene clusters

#

.. @fun gseGO

.. @geneClusters List of 3

$ List1: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... ..- attr(, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... $ List2: Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ... ..- attr(, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ... $ List3: Named num [1:12495] 4.3 3.95 3.6 3.46 3.42 ... ..- attr(*, "names")= chr [1:12495] "4969" "57758" "79901" "79838" ...

...Result 'data.frame': 820 obs. of 12 variables:

$ Cluster : Factor w/ 3 levels "List1","List2",..: 1 1 1 1 1 1 1 1 1 1 ... $ ID : chr "GO:0007059" "GO:0051276" "GO:0098813" "GO:0000819" ... $ Description : chr "chromosome segregation" "chromosome organization" "nuclear chromosome segregation" "sister chromatid segregation" ... $ setSize : int 319 473 238 185 327 491 423 197 104 129 ... $ enrichmentScore: num 0.585 0.52 0.633 0.661 0.541 ... $ NES : num 2.75 2.53 2.91 2.97 2.55 ... $ pvalue : num 4.55e-31 7.56e-31 1.38e-30 1.60e-27 1.53e-24 ... $ p.adjust : num 1.73e-27 1.73e-27 2.11e-27 1.83e-24 1.16e-21 ... $ qvalue : num 1.29e-27 1.29e-27 1.57e-27 1.37e-24 8.70e-22 ... $ rank : num 1374 1374 449 449 1246 ... $ leading_edge : chr "tags=27%, list=11%, signal=25%" "tags=24%, list=11%, signal=22%" "tags=23%, list=4%, signal=22%" "tags=25%, list=4%, signal=24%" ... $ core_enrichment: chr "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPM"| truncated "CDC45/CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/HJURP/SKA1/NUSAP1/TPX2/TACC3/NEK2/CENPN/CDK1"| truncated "CDCA8/CDC20/KIF23/CENPE/MYBL2/CCNB2/NDC80/TOP2A/NCAPH/ASPM/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1"| truncated "CDCA8/CDC20/KIF23/CENPE/MYBL2/NDC80/TOP2A/NCAPH/DLGAP5/UBE2C/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/CDT"| truncated ...

.. number of enriched terms found for each gene cluster:

.. List1: 282

.. List2: 274

.. List3: 264

#

...Citation

T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141

guidohooiveld commented 1 month ago

I am lost... I have the same version of clusterProfiler as you have installed, and it works fine for me...

> packageVersion("clusterProfiler")
[1] ‘4.12.0’
> 

Yet, I am still puzzled why you get the warnings, but I don't...

Warnmeldungen:
1: In serialize(data, node$con) :
'package:stats' evtl. nicht verfügbar während des Ladens

Also, when I check the source code of simplify I do see that GOALL is supported, but you get the error...

GOALL was indeed initially not supported, but since v4.2.0 it is.... check this commit (from 27 Oct 2021): https://github.com/YuLab-SMU/clusterProfiler/commit/b75f09ae22278394076fa7da0db5e09479383919

Below is is what I see when I check the source code in R. Note the explicit mention of GOALL. What about your installation?

> library(clusterProfiler)
> selectMethod(simplify, signature="gseaResult")
Method Definition:

function (x, ...) 
{
    .local <- function (x, cutoff = 0.7, by = "p.adjust", select_fun = min, 
        measure = "Wang", semData = NULL) 
    {
        if (!x@setType %in% c("BP", "MF", "CC", "GOALL")) 
            stop("simplify only applied to output from gseGO and enrichGO...")
        res <- as.data.frame(x)
        if (x@setType == "GOALL") {
            x@result <- simplify_ALL(res = res, cutoff = cutoff, 
                by = by, select_fun = select_fun, measure = measure, 
                semData = semData)
        }
        else {
            x@result <- simplify_internal(res = res, cutoff = cutoff, 
                by = by, select_fun = select_fun, measure = measure, 
                ontology = x@setType, semData = semData)
        }
        return(x)
    }
    .local(x, ...)
}
<bytecode: 0x000002d8d342b808>
<environment: namespace:clusterProfiler>

Signatures:
        x           
target  "gseaResult"
defined "gseaResult"
> 
Sidragull57 commented 1 month ago

I checked and get the same as you

library(clusterProfiler) > selectMethod(simplify, signature="gseaResult") Method Definition: function (x, ...) { .local <- function (x, cutoff = 0.7, by = "p.adjust", select_fun = min, measure = "Wang", semData = NULL) { if (!x@setType %in% c("BP", "MF", "CC", "GOALL")) stop("simplify only applied to output from gseGO and enrichGO...") res <- as.data.frame(x) if (x@setType == "GOALL") { x@result <- simplify_ALL(res = res, cutoff = cutoff, by = by, select_fun = select_fun, measure = measure, semData = semData) } else { x@result <- simplify_internal(res = res, cutoff = cutoff, by = by, select_fun = select_fun, measure = measure, ontology = x@setType, semData = semData) } return(x) } .local(x, ...) } Signatures: x target "gseaResult" defined "gseaResult" --   > | > >
guidohooiveld commented 1 month ago

So:

--> I am out of suggestions... Sorry! The last thing to suggest is to use another PC or laptop...

Sidragull57 commented 1 month ago

Thank you very much for your support. I am now atleast able to remove redundancy from the GSEA output for a specific ontology. Your help is greatly appreciated.

guidohooiveld commented 1 month ago

One last remark:

Again, be sure to also run the code in R 'only' (and not through R-studio)!

Another user just reported that this solved his/her problem, although that problem was not at all related to your issue... but you never know...

https://github.com/YuLab-SMU/clusterProfiler/issues/708#issuecomment-2239235571