YuLab-SMU / enrichplot

Visualization of Functional Enrichment Result
https://yulab-smu.top/biomedical-knowledge-mining-book/
232 stars 65 forks source link

merge_result not compatible with pairwise_termsim #291

Closed nina-hahn closed 2 months ago

nina-hahn commented 2 months ago

Dear all,

I am using the merge_result function but finally pairwise_termsim is not working mentioning that gcsize has not been found.

mkk.up <- enrichMKEGG(gene         = DEGs.up$ENTREZID,
                    organism     = 'mmu',
                    pvalueCutoff = 0.05,
                    qvalueCutoff = 0.05,
                    pAdjustMethod = "BH")
mkk.up<-setReadable(mkk.up, organism, "ENTREZID")
mkk.up<-pairwise_termsim(mkk.up, showCategory = 30) #default 30

mkk.down <- enrichMKEGG(gene         = DEGs.down$ENTREZID,
                      organism     = 'mmu',
                      pvalueCutoff = 0.05,
                      qvalueCutoff = 0.05,
                      pAdjustMethod = "BH")
mkk.down<-setReadable(mkk.down, organism, "ENTREZID")
mkk.down<-pairwise_termsim(mkk.down, showCategory = 30)

mkk.list<-list("activated"=mkk.up, "suppressed"=mkk.down)

mkk.combined<-merge_result(mkk.list)

mkk.combined<-pairwise_termsim(mkk.combined, showCategory=30)

==> Error in fortify.compareClusterResult(x, showCategory = showCategory, : Objekt 'gcsize' not found

sessionInfo() R version 4.4.0 (2024-04-24 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8 LC_MONETARY=German_Germany.utf8 [4] LC_NUMERIC=C LC_TIME=German_Germany.utf8

time zone: Europe/Berlin tzcode source: internal

attached base packages: [1] stats4 stats graphics grDevices datasets utils methods base

other attached packages: [1] europepmc_0.4.3 ggupset_0.4.0 org.Mm.eg.db_3.19.1 AnnotationDbi_1.66.0
[5] IRanges_2.38.1 S4Vectors_0.42.1 Biobase_2.64.0 BiocGenerics_0.50.0
[9] stringr_1.5.1 DOSE_3.31.3 pathview_1.44.0 enrichplot_1.25.1.001 [13] ggplot2_3.5.1 renv_1.0.7 openxlsx_4.2.6.1 clusterProfiler_4.12.3 [17] SeuratObject_5.0.2 sp_2.1-4

loaded via a namespace (and not attached): [1] RColorBrewer_1.1-3 rstudioapi_0.16.0 jsonlite_1.8.8 magrittr_2.0.3
[5] farver_2.1.2 fs_1.6.4 zlibbioc_1.50.0 vctrs_0.6.5
[9] memoise_2.0.1 RCurl_1.98-1.16 ggtree_3.12.0 progress_1.2.3
[13] curl_5.2.1 gridGraphics_0.5-1 parallelly_1.38.0 desc_1.4.3
[17] plyr_1.8.9 httr2_1.0.2 cachem_1.1.0 igraph_2.0.3
[21] lifecycle_1.0.4 pkgconfig_2.0.3 Matrix_1.7-0 R6_2.5.1
[25] fastmap_1.2.0 gson_0.1.0 GenomeInfoDbData_1.2.12 future_1.34.0
[29] digest_0.6.36 aplot_0.2.3 ggnewscale_0.5.0 colorspace_2.1-1
[33] patchwork_1.2.0 ps_1.7.7 RSQLite_2.3.7 org.Hs.eg.db_3.19.1
[37] labeling_0.4.3 progressr_0.14.0 urltools_1.7.3 fansi_1.0.6
[41] httr_1.4.7 polyclip_1.10-7 compiler_4.4.0 remotes_2.5.0
[45] bit64_4.0.5 withr_3.0.1 BiocParallel_1.38.0 viridis_0.6.5
[49] DBI_1.2.3 pkgbuild_1.4.4 ggforce_0.4.2 MASS_7.3-61
[53] rappdirs_0.3.3 tools_4.4.0 ape_5.8 scatterpie_0.2.3
[57] zip_2.3.1 future.apply_1.11.2 glue_1.7.0 callr_3.7.6
[61] nlme_3.1-166 GOSemSim_2.31.1 grid_4.4.0 shadowtext_0.1.4
[65] cluster_2.1.6 reshape2_1.4.4 snow_0.4-4 fgsea_1.30.0
[69] generics_0.1.3 gtable_0.3.5 tidyr_1.3.1 hms_1.1.3
[73] data.table_1.15.4 xml2_1.3.6 tidygraph_1.3.1 utf8_1.2.4
[77] XVector_0.44.0 ggrepel_0.9.5 pillar_1.9.0 yulab.utils_0.1.6
[81] spam_2.10-0 splines_4.4.0 dplyr_1.1.4 tweenr_2.0.3
[85] treeio_1.28.0 lattice_0.22-6 bit_4.0.5 tidyselect_1.2.1
[89] GO.db_3.19.1 Biostrings_2.72.1 gitcreds_0.1.2 gridExtra_2.3
[93] graphlayouts_1.1.1 KEGGgraph_1.64.0 stringi_1.8.4 UCSC.utils_1.0.0
[97] lazyeval_0.2.2 ggfun_0.1.5 codetools_0.2-20 ggraph_2.2.1
[101] tibble_3.2.1 qvalue_2.36.0 Rgraphviz_2.48.0 graph_1.82.0
[105] BiocManager_1.30.23 ggplotify_0.1.2 cli_3.6.3 munsell_0.5.1
[109] processx_3.8.4 Rcpp_1.0.13 GenomeInfoDb_1.40.1 globals_0.16.3
[113] triebeard_0.4.1 png_0.1-8 XML_3.99-0.17 parallel_4.4.0
[117] blob_1.2.4 prettyunits_1.2.0 dotCall64_1.1-1 bitops_1.0-8
[121] listenv_0.9.1 viridisLite_0.4.2 tidytree_0.4.6 ggridges_0.5.6
[125] scales_1.3.0 purrr_1.0.2 crayon_1.5.3 rlang_1.1.4
[129] cowplot_1.1.3 fastmatch_1.1-4 KEGGREST_1.44.1

Best, Nina

guidohooiveld commented 2 months ago

Could you please provide a bit more context?

Most important, how were mkk.up and mkk.down generated, and what do you want to achieve?

nina-hahn commented 2 months ago

Dear Guido,

sorry, I added how mkk.up and mkk.down were generated.

Finally, I aim to plot an emapplot from mkk.combined. There I get the error "Error in has_pairsim(x) : Term similarity matrix not available. Please use pairwise_termsim function to deal with the results of enrichment analysis." why I aim to conduct the pairwise_termsim function for mkk.combined.

Best

guidohooiveld commented 2 months ago

Mmm, using the example data set included in DOSE your code is working in my hands...


> ## Load required libaries
> library(clusterProfiler)
> library(enrichplot)
> library(org.Hs.eg.db)
> 
> ## Using the example dataset.
> data(geneList, package="DOSE")
> genes.up <- names(geneList)[geneList > 1.5]
> genes.down <- names(geneList)[geneList < -1.5]
> 
> 
> ## Note: to illustrate the code no cutoff is applied.
> ## still only few modules are enriched!
> 
> ## Upregulated genes
> mkk.up <- enrichMKEGG(gene             = genes.up,
+                       organism         = 'hsa',
+                       pvalueCutoff     = 1,
+                       qvalueCutoff     = 1,
+                       pAdjustMethod    = "BH")
> 
> ## Convert ids into gene symbols
> mkk.up <- setReadable(mkk.up, OrgDb=org.Hs.eg.db, keyType="ENTREZID")
> 
> ## Caclulate similarity matrix.
> mkk.up <- pairwise_termsim(mkk.up, showCategory = 30)
> 
> ## Check; note that 7 enriched terms are found
> mkk.up
#
# over-representation test
#
#...@organism    hsa 
#...@ontology    MKEGG 
#...@keytype     ENTREZID 
#...@gene        chr [1:228] "4312" "8318" "10874" "55143" "55388" "991" "6280" "2305" ...
#...pvalues adjusted by 'BH' with cutoff <1 
#...7 enriched terms found
'data.frame':   7 obs. of  9 variables:
 $ ID         : chr  "M00912" "M00053" "M00976" "M00035" ...
 $ Description: chr  "NAD biosynthesis, tryptophan => quinolinate => NAD" "Deoxyribonucleotide biosynthesis, ADP/GDP/CDP/UDP => dATP/dGTP/dCTP/dUTP" "C19-Steroid hormone biosynthesis, pregnenolone => testosterone => dihydrotestosterone" "Methionine degradation" ...
 $ GeneRatio  : chr  "3/12" "1/12" "1/12" "1/12" ...
 $ BgRatio    : chr  "12/871" "11/871" "11/871" "12/871" ...
 $ pvalue     : num  0.000411 0.142293 0.142293 0.154261 0.154261 ...
 $ p.adjust   : num  0.00288 0.20736 0.20736 0.20736 0.20736 ...
 $ qvalue     : num  0.0026 0.1871 0.1871 0.1871 0.1871 ...
 $ geneID     : chr  "QPRT/IDO1/TDO2" "RRM2" "HSD17B2" "CBS" ...
 $ Count      : int  3 1 1 1 1 1 1
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> 
> 
> 
> ## Idem for the downregulated genes
> mkk.down <- enrichMKEGG(gene            = genes.down,
+                        organism         = 'hsa',
+                        pvalueCutoff     = 1,
+                        qvalueCutoff     = 1,
+                        pAdjustMethod    = "BH")
> 
> ## Convert ids into gene symbols
> mkk.down <- setReadable(mkk.down, OrgDb=org.Hs.eg.db, keyType="ENTREZID")
> 
> ## Caclulate similarity matrix.
> mkk.down <- pairwise_termsim(mkk.down, showCategory = 30)
> 
> ## Check; note that 10 enriched terms are found
> mkk.down
#
# over-representation test
#
#...@organism    hsa 
#...@ontology    MKEGG 
#...@keytype     ENTREZID 
#...@gene        chr [1:285] "2151" "2331" "55800" "4223" "4485" "55638" "23541" "55888" ...
#...pvalues adjusted by 'BH' with cutoff <1 
#...10 enriched terms found
'data.frame':   10 obs. of  9 variables:
 $ ID         : chr  "M00095" "M00032" "M00034" "M00094" ...
 $ Description: chr  "C5 isoprenoid biosynthesis, mevalonate pathway" "Lysine degradation, lysine => saccharopine => acetoacetyl-CoA" "Methionine salvage pathway" "Ceramide biosynthesis" ...
 $ GeneRatio  : chr  "1/15" "1/15" "1/15" "1/15" ...
 $ BgRatio    : chr  "10/871" "11/871" "12/871" "12/871" ...
 $ pvalue     : num  0.16 0.175 0.189 0.189 0.203 ...
 $ p.adjust   : num  0.355 0.355 0.355 0.355 0.355 ...
 $ qvalue     : num  0.355 0.355 0.355 0.355 0.355 ...
 $ geneID     : chr  "HMGCS2" "AASS" "TAT" "CERS4" ...
 $ Count      : int  1 1 1 1 1 1 1 1 1 1
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> 
> 
> 
> ## Combine the 2 results
> mkk.list <- list("activated"=mkk.up, "suppressed"=mkk.down)
> 
> mkk.combined <- merge_result(mkk.list)
> 
> mkk.combined <- pairwise_termsim(mkk.combined, showCategory=30)
> 
> 
> ## Check output; note that the number of activated and supressed gene sets.
> ## (Still) is 7 resp. 10.
> mkk.combined
#
# Result of Comparing 0 gene clusters 
#
#.. @fun          
#.. @geneClusters        list()
#...Result      'data.frame':   17 obs. of  10 variables:
 $ Cluster    : Factor w/ 2 levels "activated","suppressed": 1 1 1 1 1 1 1 2 2 2 ...
 $ ID         : chr  "M00912" "M00053" "M00976" "M00035" ...
 $ Description: chr  "NAD biosynthesis, tryptophan => quinolinate => NAD" "Deoxyribonucleotide biosynthesis, ADP/GDP/CDP/UDP => dATP/dGTP/dCTP/dUTP" "C19-Steroid hormone biosynthesis, pregnenolone => testosterone => dihydrotestosterone" "Methionine degradation" ...
 $ GeneRatio  : chr  "3/12" "1/12" "1/12" "1/12" ...
 $ BgRatio    : chr  "12/871" "11/871" "11/871" "12/871" ...
 $ pvalue     : num  0.000411 0.142293 0.142293 0.154261 0.154261 ...
 $ p.adjust   : num  0.00288 0.20736 0.20736 0.20736 0.20736 ...
 $ qvalue     : num  0.0026 0.1871 0.1871 0.1871 0.1871 ...
 $ geneID     : chr  "QPRT/IDO1/TDO2" "RRM2" "HSD17B2" "CBS" ...
 $ Count      : int  3 1 1 1 1 1 1 1 1 1 ...
#.. number of enriched terms found for each gene cluster:
#..   activated: 7 
#..   suppressed: 10 
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, 
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. 
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. 
The Innovation. 2021, 2(3):100141 

> 
> 
> ## Also note that gene set M00056 (O-glycan biosynthesis, mucin type core)
> ## is enriched in both activated and suppressed genes
> as.data.frame(mkk.combined)
      Cluster     ID
1   activated M00912
2   activated M00053
3   activated M00976
4   activated M00035
5   activated M00052
6   activated M00938
7   activated M00056
8  suppressed M00095
9  suppressed M00032
10 suppressed M00034
11 suppressed M00094
12 suppressed M00415
13 suppressed M00099
14 suppressed M00003
15 suppressed M00049
16 suppressed M00154
17 suppressed M00056
                                                                             Description
1                                     NAD biosynthesis, tryptophan => quinolinate => NAD
2               Deoxyribonucleotide biosynthesis, ADP/GDP/CDP/UDP => dATP/dGTP/dCTP/dUTP
3  C19-Steroid hormone biosynthesis, pregnenolone => testosterone => dihydrotestosterone
4                                                                 Methionine degradation
5                         Pyrimidine ribonucleotide biosynthesis, UMP => UDP/UTP,CDP/CTP
6                               Pyrimidine deoxyribonucleotide biosynthesis, UDP => dTTP
7                                                 O-glycan biosynthesis, mucin type core
8                                         C5 isoprenoid biosynthesis, mevalonate pathway
9                          Lysine degradation, lysine => saccharopine => acetoacetyl-CoA
10                                                            Methionine salvage pathway
11                                                                 Ceramide biosynthesis
12                                        Fatty acid elongation in endoplasmic reticulum
13                                                              Sphingosine biosynthesis
14                                          Gluconeogenesis, oxaloacetate => fructose-6P
15                                   Adenine ribonucleotide biosynthesis, IMP => ADP,ATP
16                                                                  Cytochrome c oxidase
17                                                O-glycan biosynthesis, mucin type core
   GeneRatio BgRatio      pvalue    p.adjust      qvalue         geneID Count
1       3/12  12/871 0.000411033 0.002877231 0.002595998 QPRT/IDO1/TDO2     3
2       1/12  11/871 0.142293179 0.207362399 0.187093894           RRM2     1
3       1/12  11/871 0.142293179 0.207362399 0.187093894        HSD17B2     1
4       1/12  12/871 0.154261181 0.207362399 0.187093894            CBS     1
5       1/12  12/871 0.154261181 0.207362399 0.187093894          CTPS1     1
6       1/12  14/871 0.177739200 0.207362399 0.187093894           RRM2     1
7       1/12  28/871 0.326077271 0.326077271 0.294205057        GALNT14     1
8       1/15  10/871 0.160230679 0.354530484 0.354530484         HMGCS2     1
9       1/15  11/871 0.174860807 0.354530484 0.354530484           AASS     1
10      1/15  12/871 0.189252770 0.354530484 0.354530484            TAT     1
11      1/15  12/871 0.189252770 0.354530484 0.354530484          CERS4     1
12      1/15  13/871 0.203410172 0.354530484 0.354530484         ELOVL2     1
13      1/15  16/871 0.244510307 0.354530484 0.354530484          CERS4     1
14      1/15  18/871 0.270801435 0.354530484 0.354530484           PCK1     1
15      1/15  19/871 0.283624387 0.354530484 0.354530484            AK5     1
16      1/15  24/871 0.344642260 0.382935845 0.382935845         COX7A1     1
17      1/15  28/871 0.389926056 0.389926056 0.389926056        GALNT10     1
> 
> ## Display in dotplot.
> ## Note that number of sets plotted equal 7 resp. 10,
> ## despite setting showCategory=15.
> ## Also note set M00056 present in both activated and supressed
> p1 <- dotplot(mkk.combined, split="Cluster", showCategory=15, x="GeneRatio",
+                  size= "count", font.size=7, label_format=10) +
+              facet_grid(.~Cluster) +
+              ggtitle("MKEGG ORA results, displaying ratios")
> 
> print(p1)
>

image

> p2 <- dotplot(mkk.combined, split="Cluster", showCategory=15, x="Count",
+                  size= "count", font.size=7, label_format=10) +
+              facet_grid(.~Cluster) +
+              ggtitle("MKEGG ORA results, displaying # of hits in gene set")
> 
> print(p2)
> 

image

> ## Display in Enrichment Map
> p3 <- emapplot(mkk.combined) +
+                ggtitle("MKEGG ORA results, emapp")
> 
> print(p3)
>

image

> packageVersion("clusterProfiler")
[1] ‘4.12.3’
> 
> packageVersion("enrichplot")
[1] ‘1.24.2’
> 
> packageVersion("DOSE")
[1] ‘3.30.2’
> 
> sessionInfo()
R version 4.4.0 Patched (2024-05-21 r86580 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Europe/Amsterdam
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] org.Hs.eg.db_3.19.1    AnnotationDbi_1.66.0   IRanges_2.38.1        
[4] S4Vectors_0.42.1       Biobase_2.64.0         BiocGenerics_0.50.0   
[7] enrichplot_1.24.2      clusterProfiler_4.12.3

loaded via a namespace (and not attached):
  [1] DBI_1.2.3               gson_0.1.0              shadowtext_0.1.4       
  [4] gridExtra_2.3           httr2_1.0.2             rlang_1.1.4            
  [7] magrittr_2.0.3          DOSE_3.30.2             compiler_4.4.0         
 [10] RSQLite_2.3.7           png_0.1-8               vctrs_0.6.5            
 [13] reshape2_1.4.4          stringr_1.5.1           pkgconfig_2.0.3        
 [16] crayon_1.5.3            fastmap_1.2.0           XVector_0.44.0         
 [19] labeling_0.4.3          ggraph_2.2.1            utf8_1.2.4             
 [22] HDO.db_0.99.1           UCSC.utils_1.0.0        purrr_1.0.2            
 [25] bit_4.0.5               zlibbioc_1.50.0         cachem_1.1.0           
 [28] aplot_0.2.3             GenomeInfoDb_1.40.1     jsonlite_1.8.8         
 [31] blob_1.2.4              BiocParallel_1.38.0     tweenr_2.0.3           
 [34] parallel_4.4.0          R6_2.5.1                stringi_1.8.4          
 [37] RColorBrewer_1.1-3      GOSemSim_2.30.1         Rcpp_1.0.13            
 [40] Matrix_1.7-0            splines_4.4.0           igraph_2.0.3           
 [43] tidyselect_1.2.1        qvalue_2.36.0           viridis_0.6.5          
 [46] codetools_0.2-20        lattice_0.22-6          tibble_3.2.1           
 [49] plyr_1.8.9              treeio_1.28.0           withr_3.0.1            
 [52] KEGGREST_1.44.1         gridGraphics_0.5-1      scatterpie_0.2.3       
 [55] polyclip_1.10-7         Biostrings_2.72.1       pillar_1.9.0           
 [58] ggtree_3.12.0           ggfun_0.1.5             generics_0.1.3         
 [61] ggplot2_3.5.1           munsell_0.5.1           scales_1.3.0           
 [64] tidytree_0.4.6          glue_1.7.0              lazyeval_0.2.2         
 [67] tools_4.4.0             ggnewscale_0.5.0        data.table_1.15.4      
 [70] fgsea_1.30.0            fs_1.6.4                graphlayouts_1.1.1     
 [73] fastmatch_1.1-4         tidygraph_1.3.1         cowplot_1.1.3          
 [76] grid_4.4.0              tidyr_1.3.1             ape_5.8                
 [79] colorspace_2.1-1        nlme_3.1-166            GenomeInfoDbData_1.2.12
 [82] patchwork_1.2.0         ggforce_0.4.2           cli_3.6.3              
 [85] rappdirs_0.3.3          fansi_1.0.6             viridisLite_0.4.2      
 [88] dplyr_1.1.4             gtable_0.3.5            yulab.utils_0.1.6      
 [91] digest_0.6.37           ggrepel_0.9.5           ggplotify_0.1.2        
 [94] farver_2.1.2            memoise_2.0.1           lifecycle_1.0.4        
 [97] httr_1.4.7              GO.db_3.19.1            bit64_4.0.5            
[100] MASS_7.3-61            
> 
> 
nina-hahn commented 2 months ago

Hi Guido,

many thinks for testing! I compared to my results. I checked in the .up and .down data sets and in the .comibined one the number of terms found. In each the number is >0 and the sum of the individual ones and the .combined data sets is equal (3+10=13). In addition, the clusters were correctly assigned.

A difference I noticed is that I am using the mmu data base while you used the hsa data base. May this lead to the error? In addition, I am using the version enrichplot_1.25.1.001 while you are using enrichplot_1.24.2. ClusterProfiler is the identical version.

Best, Nina

guidohooiveld commented 2 months ago

Aha, I now recall that you have installed the GitHub / development versions of GOSemSim, DOSE an enrichplot in order to fix another problem reported here: https://github.com/YuLab-SMU/clusterProfiler/issues/715.

When I update these 3 packages to the GitHub version I also got an error.

To solve your problem you can downgrade to enrichpot v1.25.0.2 by doing: > BiocManager::install(c('YuLab-SMU/enrichplot@af1f329aefadf03b8d206458ff9eb18434bedf7a'), force=TRUE)

This is commit af1f329 (link), dated July 26, 2024.

@GuangchuangYu when using the commit made after af1f329, thus 17e42c4 (link) dated Aug 16, 2024, somehow an bug has been introduced. This results in an error (i.e. object 'gcsize' not found) with the function pairwise_termsim, also reported in the first post of this thread by @nina-hahn.

> ## install commit `17e42c4`
> BiocManager::install(c('YuLab-SMU/enrichplot@17e42c43cc51e6b657e9a7428f9b3bc569094ae9'), force=TRUE)

> packageVersion("enrichplot")
[1] ‘1.25.0.4’

## using the code from my / 2nd post in this thread, and then this line:
> mkk.combined <- pairwise_termsim(mkk.combined, showCategory=30)
Error in fortify.compareClusterResult(x, showCategory = showCategory,  : 
  object 'gcsize' not found
> 
>
> traceback()
6: paste(as.character(result$Cluster), "\n", "(", gcsize, ")", sep = "")
5: fortify.compareClusterResult(x, showCategory = showCategory, 
       includeAll = TRUE, split = NULL)
4: fortify(x, showCategory = showCategory, includeAll = TRUE, split = NULL)
3: pairwise_termsim.compareClusterResult(x, method = method, semData = semData, 
       showCategory = showCategory)
2: pairwise_termsim(mkk.combined, showCategory = 30)
1: pairwise_termsim(mkk.combined, showCategory = 30)
> 

You made this commit to solve issue 715 reported on the clusterProfiler GitHub. https://github.com/YuLab-SMU/clusterProfiler/issues/715

Could you please have a look at this?

nina-hahn commented 2 months ago

Thanks for identifying the main problem! I guess downgrading would result in my previous issue #715. Maybe I have to wait for the next fix to analyze my data?

Best

GuangchuangYu commented 2 months ago

Current version should be OK, see https://github.com/YuLab-SMU/enrichplot/commit/3ebaca27f6b5663110005b3eac677e5543e6d3e9#diff-c05d427d5e8b25e3402cbf72eeed5ac2371bd3d4d851138f87a8cf67fe00636b.

guidohooiveld commented 2 months ago

Thanks, but unfortunately now another issue pops up...

Using my code above:

> emapplot(mkk.combined)
Error in `[<-.data.frame`(`*tmp*`, pie_data[i, 2], pie_data[i, 1], value = 1) : 
  missing values are not allowed in subscripted assignments of data frames
>
> traceback()
10: stop("missing values are not allowed in subscripted assignments of data frames")
9: `[<-.data.frame`(`*tmp*`, pie_data[i, 2], pie_data[i, 1], value = 1)
8: `[<-`(`*tmp*`, pie_data[i, 2], pie_data[i, 1], value = 1)
7: prepare_pie_data(pie_data, pie = pie)
6: prepare_pie_category(enrichDf = enrichDf, pie = pie)
5: get_pie_data(enrichDf = y, pie = pie, mergedEnrichDf = mergedEnrichDf, 
       cex_pie2axis = cex_pie2axis, p = p, cex_category = cex_category)
4: emapplot.compareClusterResult(x, showCategory = showCategory, 
       ...)
3: .local(x, ...)
2: emapplot(mkk.combined)
1: emapplot(mkk.combined)
> 
> mkk.combined
#
# Result of Comparing 0 gene clusters 
#
#.. @fun          
#.. @geneClusters        list()
#...Result      'data.frame':   17 obs. of  13 variables:
 $ Cluster       : Factor w/ 2 levels "activated","suppressed": 1 1 1 1 1 1 1 2 2 2 ...
 $ ID            : chr  "M00912" "M00053" "M00976" "M00035" ...
 $ Description   : chr  "NAD biosynthesis, tryptophan => quinolinate => NAD" "Deoxyribonucleotide biosynthesis, ADP/GDP/CDP/UDP => dATP/dGTP/dCTP/dUTP" "C19-Steroid hormone biosynthesis, pregnenolone => testosterone => dihydrotestosterone" "Methionine degradation" ...
 $ GeneRatio     : chr  "3/12" "1/12" "1/12" "1/12" ...
 $ BgRatio       : chr  "12/871" "11/871" "11/871" "12/871" ...
 $ RichFactor    : num  0.25 0.0909 0.0909 0.0833 0.0833 ...
 $ FoldEnrichment: num  18.15 6.6 6.6 6.05 6.05 ...
 $ zScore        : num  7.06 2.21 2.21 2.08 2.08 ...
 $ pvalue        : num  0.000411 0.142293 0.142293 0.154261 0.154261 ...
 $ p.adjust      : num  0.00288 0.20736 0.20736 0.20736 0.20736 ...
 $ qvalue        : num  0.0026 0.1871 0.1871 0.1871 0.1871 ...
 $ geneID        : chr  "QPRT/IDO1/TDO2" "RRM2" "HSD17B2" "CBS" ...
 $ Count         : int  3 1 1 1 1 1 1 1 1 1 ...
#.. number of enriched terms found for each gene cluster:
#..   activated: 7 
#..   suppressed: 10 
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, 
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. 
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. 
The Innovation. 2021, 2(3):100141 

>
> as.data.frame(mkk.combined)
      Cluster     ID
1   activated M00912
2   activated M00053
3   activated M00976
4   activated M00035
5   activated M00052
6   activated M00938
7   activated M00056
8  suppressed M00095
9  suppressed M00032
10 suppressed M00034
11 suppressed M00094
12 suppressed M00415
13 suppressed M00099
14 suppressed M00003
15 suppressed M00049
16 suppressed M00154
17 suppressed M00056
                                                                             Description
1                                     NAD biosynthesis, tryptophan => quinolinate => NAD
2               Deoxyribonucleotide biosynthesis, ADP/GDP/CDP/UDP => dATP/dGTP/dCTP/dUTP
3  C19-Steroid hormone biosynthesis, pregnenolone => testosterone => dihydrotestosterone
4                                                                 Methionine degradation
5                         Pyrimidine ribonucleotide biosynthesis, UMP => UDP/UTP,CDP/CTP
6                               Pyrimidine deoxyribonucleotide biosynthesis, UDP => dTTP
7                                                 O-glycan biosynthesis, mucin type core
8                                         C5 isoprenoid biosynthesis, mevalonate pathway
9                          Lysine degradation, lysine => saccharopine => acetoacetyl-CoA
10                                                            Methionine salvage pathway
11                                                                 Ceramide biosynthesis
12                                        Fatty acid elongation in endoplasmic reticulum
13                                                              Sphingosine biosynthesis
14                                          Gluconeogenesis, oxaloacetate => fructose-6P
15                                   Adenine ribonucleotide biosynthesis, IMP => ADP,ATP
16                                                                  Cytochrome c oxidase
17                                                O-glycan biosynthesis, mucin type core
   GeneRatio BgRatio RichFactor FoldEnrichment    zScore      pvalue
1       3/12  12/871 0.25000000      18.145833 7.0649047 0.000411033
2       1/12  11/871 0.09090909       6.598485 2.2073505 0.142293179
3       1/12  11/871 0.09090909       6.598485 2.2073505 0.142293179
4       1/12  12/871 0.08333333       6.048611 2.0802696 0.154261181
5       1/12  12/871 0.08333333       6.048611 2.0802696 0.154261181
6       1/12  14/871 0.07142857       5.184524 1.8645469 0.177739200
7       1/12  28/871 0.03571429       2.592262 1.0116573 0.326077271
8       1/15  10/871 0.10000000       5.806667 2.0226037 0.160230679
9       1/15  11/871 0.09090909       5.278788 1.8894537 0.174860807
10      1/15  12/871 0.08333333       4.838889 1.7716093 0.189252770
11      1/15  12/871 0.08333333       4.838889 1.7716093 0.189252770
12      1/15  13/871 0.07692308       4.466667 1.6661285 0.203410172
13      1/15  16/871 0.06250000       3.629167 1.4043114 0.244510307
14      1/15  18/871 0.05555556       3.225926 1.2625274 0.270801435
15      1/15  19/871 0.05263158       3.056140 1.1988867 0.283624387
16      1/15  24/871 0.04166667       2.419444 0.9329335 0.344642260
17      1/15  28/871 0.03571429       2.073810 0.7641187 0.389926056
      p.adjust      qvalue         geneID Count
1  0.002877231 0.002595998 QPRT/IDO1/TDO2     3
2  0.207362399 0.187093894           RRM2     1
3  0.207362399 0.187093894        HSD17B2     1
4  0.207362399 0.187093894            CBS     1
5  0.207362399 0.187093894          CTPS1     1
6  0.207362399 0.187093894           RRM2     1
7  0.326077271 0.294205057        GALNT14     1
8  0.354530484 0.354530484         HMGCS2     1
9  0.354530484 0.354530484           AASS     1
10 0.354530484 0.354530484            TAT     1
11 0.354530484 0.354530484          CERS4     1
12 0.354530484 0.354530484         ELOVL2     1
13 0.354530484 0.354530484          CERS4     1
14 0.354530484 0.354530484           PCK1     1
15 0.354530484 0.354530484            AK5     1
16 0.382935845 0.382935845         COX7A1     1
17 0.389926056 0.389926056        GALNT10     1
> emapplot(mkk.combined)
Error in `[<-.data.frame`(`*tmp*`, pie_data[i, 2], pie_data[i, 1], value = 1) : 
  missing values are not allowed in subscripted assignments of data frames
> 
> packageVersion("clusterProfiler")
[1] ‘4.12.6’
> 
> packageVersion("enrichplot")
[1] ‘1.25.2.1’
> 
> packageVersion("DOSE")
[1] ‘3.99.1’
> 
guidohooiveld commented 2 months ago

Debugging by manually running the function emapplot.compareClusterResult: Starting at line 449 https://github.com/YuLab-SMU/enrichplot/blob/41ce0a08d19a14f9b52206bacd4b992b22817f1c/R/emapplot.R#L449

When I execute the function line-by-line, it goes fine until line 656, where fortify is called. https://github.com/YuLab-SMU/enrichplot/blob/41ce0a08d19a14f9b52206bacd4b992b22817f1c/R/emapplot.R#L656-L657

The output is unexpected, because the cluster named suppressed seems to be gone... NA are introduced...

> y <- fortify(x, showCategory = showCategory,
+                  includeAll = TRUE, split = split)
> y
                           Cluster     ID
1                activated\n(0.25) M00912
2  activated\n(0.0833333333333333) M00053
3  activated\n(0.0833333333333333) M00976
4  activated\n(0.0833333333333333) M00035
5  activated\n(0.0833333333333333) M00052
6  activated\n(0.0833333333333333) M00938
7  activated\n(0.0833333333333333) M00056
8                             <NA> M00095
9                             <NA> M00032
10                            <NA> M00034
11                            <NA> M00094
12                            <NA> M00415
13                            <NA> M00099
14                            <NA> M00003
15                            <NA> M00049
16                            <NA> M00154
17                            <NA> M00056
                                                                             Description
1                                     NAD biosynthesis, tryptophan => quinolinate => NAD
2               Deoxyribonucleotide biosynthesis, ADP/GDP/CDP/UDP => dATP/dGTP/dCTP/dUTP
3  C19-Steroid hormone biosynthesis, pregnenolone => testosterone => dihydrotestosterone
4                                                                 Methionine degradation
5                         Pyrimidine ribonucleotide biosynthesis, UMP => UDP/UTP,CDP/CTP
6                               Pyrimidine deoxyribonucleotide biosynthesis, UDP => dTTP
7                                                 O-glycan biosynthesis, mucin type core
8                                         C5 isoprenoid biosynthesis, mevalonate pathway
9                          Lysine degradation, lysine => saccharopine => acetoacetyl-CoA
10                                                            Methionine salvage pathway
11                                                                 Ceramide biosynthesis
12                                        Fatty acid elongation in endoplasmic reticulum
13                                                              Sphingosine biosynthesis
14                                          Gluconeogenesis, oxaloacetate => fructose-6P
15                                   Adenine ribonucleotide biosynthesis, IMP => ADP,ATP
16                                                                  Cytochrome c oxidase
17                                                O-glycan biosynthesis, mucin type core
    GeneRatio BgRatio RichFactor FoldEnrichment    zScore      pvalue
1  0.25000000  12/871 0.25000000      18.145833 7.0649047 0.000411033
2  0.08333333  11/871 0.09090909       6.598485 2.2073505 0.142293179
3  0.08333333  11/871 0.09090909       6.598485 2.2073505 0.142293179
4  0.08333333  12/871 0.08333333       6.048611 2.0802696 0.154261181
5  0.08333333  12/871 0.08333333       6.048611 2.0802696 0.154261181
6  0.08333333  14/871 0.07142857       5.184524 1.8645469 0.177739200
7  0.08333333  28/871 0.03571429       2.592262 1.0116573 0.326077271
8  0.06666667  10/871 0.10000000       5.806667 2.0226037 0.160230679
9  0.06666667  11/871 0.09090909       5.278788 1.8894537 0.174860807
10 0.06666667  12/871 0.08333333       4.838889 1.7716093 0.189252770
11 0.06666667  12/871 0.08333333       4.838889 1.7716093 0.189252770
12 0.06666667  13/871 0.07692308       4.466667 1.6661285 0.203410172
13 0.06666667  16/871 0.06250000       3.629167 1.4043114 0.244510307
14 0.06666667  18/871 0.05555556       3.225926 1.2625274 0.270801435
15 0.06666667  19/871 0.05263158       3.056140 1.1988867 0.283624387
16 0.06666667  24/871 0.04166667       2.419444 0.9329335 0.344642260
17 0.06666667  28/871 0.03571429       2.073810 0.7641187 0.389926056
      p.adjust      qvalue         geneID Count
1  0.002877231 0.002595998 QPRT/IDO1/TDO2     3
2  0.207362399 0.187093894           RRM2     1
3  0.207362399 0.187093894        HSD17B2     1
4  0.207362399 0.187093894            CBS     1
5  0.207362399 0.187093894          CTPS1     1
6  0.207362399 0.187093894           RRM2     1
7  0.326077271 0.294205057        GALNT14     1
8  0.354530484 0.354530484         HMGCS2     1
9  0.354530484 0.354530484           AASS     1
10 0.354530484 0.354530484            TAT     1
11 0.354530484 0.354530484          CERS4     1
12 0.354530484 0.354530484         ELOVL2     1
13 0.354530484 0.354530484          CERS4     1
14 0.354530484 0.354530484           PCK1     1
15 0.354530484 0.354530484            AK5     1
16 0.382935845 0.382935845         COX7A1     1
17 0.389926056 0.389926056        GALNT10     1
> 
> sub("\n.*", "", y$Cluster)
 [1] "activated" "activated" "activated" "activated" "activated" "activated" "activated" NA          NA          NA          NA          NA          NA         
[14] NA          NA          NA          NA         
>

I believe (not sure, though) this is the cause of the underlying problem with the current version.

guidohooiveld commented 2 months ago

Yep, I can confirm it is working now. Thanks for the quick fix!

@nina-hahn : install this fixed version through: BiocManager::install(c('YuLab-SMU/enrichplot'), force=TRUE)