YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1.03k stars 256 forks source link

对小鼠进行kegg注释,无法得到注释结果。 #598

Open lizhan96 opened 1 year ago

lizhan96 commented 1 year ago
> kegg_enrich <- enrichKEGG(gene = need_gene_down2$SYMBOL,
+                           organism = "mmu",
+                           keyType = "kegg",
+                           pvalueCutoff = 0.05,
+                           qvalueCutoff = 0.1)
--> No gene can be mapped....
--> Expected input gene ID: 
--> return NULL...

我分别尝试使用symbol,entrezid作为输入都没有得到结果,此处“Expected input gene ID: ”也没有提示相应输入格式,是否是相关库下载出问题的原因?

huerqiang commented 1 year ago

请确保你输入的gene是ENTREZID,并且使用最新版本的clusterProfiler

lizhan96 commented 1 year ago
>   kegg.dn = enrichKEGG(gene.dn2$ENTREZID, organism = 'mmu', keyType = 'kegg', pvalueCutoff = 0.05, qvalueCutoff = 0.2)
--> No gene can be mapped....
--> Expected input gene ID: 
--> return NULL...
> gene.up2$ENTREZID
  [1] "394430"    "227394"    "74137"     "116847"    "14264"     "12628"    
  [7] "545370"    "240873"    "474332"    "15483"     "16425"     "77794"    
 [13] "23928"     "20378"     "20495"     "228598"    "56264"     "252864"   
 [19] "73173"     "108927"    "60596"     "15360"     "99543"     "118449"   
 [25] "12722"     "171166"    "229949"    "64817"     "329872"    "230558"   
 [31] "13119"     "13117"     "68180"     "14807"     "14077"     "17150"    
 [37] "269633"    "53419"     "231633"    "15445"     "269701"    "109648"   
 [43] "66873"     "58229"     "243537"    "66277"     "109978"    "232441"   
 [49] "28250"     "20928"     "22776"     "26367"     "100039239" "100503386"
 [55] "101488"    "20887"     "71007"     "24099"     "15446"     "16596"    
 [61] "16949"     "19378"     "15551"     "333424"    "76257"     "235636"   
 [67] "235674"    "102570"    "16773"     "13179"     "70574"     "11568"    
 [73] "16006"     "69183"     "237761"    "19049"     "17534"     "11818"    
 [79] "217258"    "328035"    "114886"    "66042"     "26380"     "23876"    
 [85] "12401"     "75512"     "66695"     "18295"     "320736"    "268729"   
 [91] "14560"     "58809"     "11727"     "219026"    "19752"     "66214"    
 [97] "109828"    "268780"    "19116"     "22762"     "12818"     "207911"   
[103] "13105"     "223726"    "58200"     "16644"     "85031"     "100503040"
[109] "64074"     "81877"     "18188"     "13078"     "18596"     "19309"    
[115] "207151"    "236149"    "51795"     "14396"     "12111"     "68854"    
> sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    grid      stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] clusterProfiler_4.9.0.002   org.Mm.eg.db_3.16.0        
 [3] AnnotationDbi_1.60.2        DESeq2_1.36.0              
 [5] SummarizedExperiment_1.26.1 Biobase_2.58.0             
 [7] MatrixGenerics_1.8.1        matrixStats_1.0.0          
 [9] GenomicRanges_1.48.0        GenomeInfoDb_1.34.9        
[11] IRanges_2.32.0              S4Vectors_0.36.2           
[13] BiocGenerics_0.44.0         gridExtra_2.3              
[15] jpeg_0.1-10                 openxlsx_4.2.5.2           
[17] factoextra_1.0.7            dplyr_1.1.2                
[19] Rtsne_0.16                  data.table_1.14.8          
[21] ggplot2_3.4.2               SNPRelate_1.30.1           
[23] gdsfmt_1.32.0               snpStats_1.46.0            
[25] Matrix_1.5-4.1              survival_3.5-5             

loaded via a namespace (and not attached):
  [1] fgsea_1.24.0           colorspace_2.1-0       ggtree_3.6.2          
  [4] gson_0.1.0             qvalue_2.30.0          XVector_0.38.0        
  [7] fs_1.6.2               aplot_0.1.10           rstudioapi_0.14       
 [10] farver_2.1.1           graphlayouts_1.0.0     ggrepel_0.9.3         
 [13] bit64_4.0.5            fansi_1.0.4            scatterpie_0.2.1      
 [16] codetools_0.2-19       splines_4.2.2          cachem_1.0.8          
 [19] GOSemSim_2.24.0        geneplotter_1.74.0     polyclip_1.10-4       
 [22] jsonlite_1.8.5         annotate_1.74.0        GO.db_3.16.0          
 [25] png_0.1-8              ggforce_0.4.1          compiler_4.2.2        
 [28] httr_1.4.6             fastmap_1.1.1          lazyeval_0.2.2        
 [31] cli_3.6.1              tweenr_2.0.2           tools_4.2.2           
 [34] igraph_1.4.3           gtable_0.3.3           glue_1.6.2            
 [37] GenomeInfoDbData_1.2.9 reshape2_1.4.4         fastmatch_1.1-3       
 [40] Rcpp_1.0.10            enrichplot_1.18.4      vctrs_0.6.2           
 [43] Biostrings_2.66.0      ape_5.7-1              nlme_3.1-162          
 [46] ggraph_2.1.0           stringr_1.5.0          lifecycle_1.0.3       
 [49] XML_3.99-0.14          DOSE_3.24.2            zlibbioc_1.44.0       
 [52] MASS_7.3-60            scales_1.2.1           tidygraph_1.2.3       
 [55] parallel_4.2.2         RColorBrewer_1.1-3     memoise_2.0.1         
 [58] downloader_0.4         ggfun_0.0.9            HDO.db_0.99.1         
 [61] yulab.utils_0.0.6      stringi_1.7.12         RSQLite_2.3.1         
 [64] genefilter_1.78.0      tidytree_0.4.2         zip_2.3.0             
 [67] BiocParallel_1.32.6    rlang_1.1.1            pkgconfig_2.0.3       
 [70] bitops_1.0-7           lattice_0.21-8         purrr_1.0.1           
 [73] treeio_1.22.0          patchwork_1.1.2        cowplot_1.1.1         
 [76] shadowtext_0.1.2       bit_4.0.5              tidyselect_1.2.0      
 [79] plyr_1.8.8             magrittr_2.0.3         R6_2.5.1              
 [82] generics_0.1.3         DelayedArray_0.22.0    DBI_1.1.3             
 [85] pillar_1.9.0           withr_2.5.0            KEGGREST_1.38.0       
 [88] RCurl_1.98-1.12        tibble_3.2.1           crayon_1.5.2          
 [91] utf8_1.2.3             viridis_0.6.3          locfit_1.5-9.8        
 [94] blob_1.2.4             digest_0.6.31          xtable_1.8-4          
 [97] tidyr_1.3.0            gridGraphics_0.5-1     munsell_0.5.0         
[100] viridisLite_0.4.2      ggplotify_0.1.0 

输入的是ENTREZID和更新至最新版本clusterProfiler,依然出现同样的问题。

yzJiang9 commented 1 year ago

Same issues here. Do not have a result even if running codes on the demo code. http://yulab-smu.top/biomedical-knowledge-mining-book/clusterprofiler-kegg.html

guidohooiveld commented 1 year ago

FYI: it is working fine for me when using the latest version of R/Bioconductor:

@lizhan96 :

> library(clusterProfiler)
> 
> ## use your first 18 genes as sample input
> gene.up2 <- c("394430", "227394", "74137", "116847", "14264", "12628",
+               "545370", "240873", "474332", "15483", "16425", "77794",
+               "23928", "20378", "20495", "228598", "56264", "252864")
> 
> ## confirm input is a character vector
> class(gene.up2)
[1] "character"
> 
> ## run enrichKEGG
> kegg.dn = enrichKEGG(gene.up2, organism = 'mmu', keyType = 'kegg', pvalueCutoff = 0.05, qvalueCutoff = 0.2)
> 
> ## check results
> kegg.dn 
#
# over-representation test
#
#...@organism    mmu 
#...@ontology    KEGG 
#...@keytype     kegg 
#...@gene        chr [1:18] "394430" "227394" "74137" "116847" "14264" "12628" "545370" ...
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...3 enriched terms found
'data.frame':   3 obs. of  9 variables:
 $ ID         : chr  "mmu00980" "mmu05204" "mmu00140"
 $ Description: chr  "Metabolism of xenobiotics by cytochrome P450 - Mus musculus (house mouse)" "Chemical carcinogenesis - DNA adducts - Mus musculus (house mouse)" "Steroid hormone biosynthesis - Mus musculus (house mouse)"
 $ GeneRatio  : chr  "2/7" "2/7" "2/7"
 $ BgRatio    : chr  "73/9214" "84/9214" "93/9214"
 $ pvalue     : num  0.00127 0.00167 0.00205
 $ p.adjust   : num  0.0164 0.0164 0.0164
 $ qvalue     : num  0.0129 0.0129 0.0129
 $ geneID     : chr  "394430/15483" "394430/15483" "394430/15483"
 $ Count      : int  2 2 2
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> as.data.frame(kegg.dn)
               ID
mmu00980 mmu00980
mmu05204 mmu05204
mmu00140 mmu00140
                                                                       Description
mmu00980 Metabolism of xenobiotics by cytochrome P450 - Mus musculus (house mouse)
mmu05204        Chemical carcinogenesis - DNA adducts - Mus musculus (house mouse)
mmu00140                 Steroid hormone biosynthesis - Mus musculus (house mouse)
         GeneRatio BgRatio      pvalue   p.adjust     qvalue       geneID Count
mmu00980       2/7 73/9214 0.001267219 0.01638342 0.01293428 394430/15483     2
mmu05204       2/7 84/9214 0.001674249 0.01638342 0.01293428 394430/15483     2
mmu00140       2/7 93/9214 0.002047928 0.01638342 0.01293428 394430/15483     2
> 

@yzJiang9:

> ## 7.2 KEGG pathway over-representation analysis
> 
> data(geneList, package="DOSE")
> gene <- names(geneList)[abs(geneList) > 2]
> 
> kk <- enrichKEGG(gene         = gene,
+                  organism     = 'hsa',
+                  pvalueCutoff = 0.05)
> head(kk)
               ID                                                   Description
hsa04110 hsa04110                                                    Cell cycle
hsa04114 hsa04114                                                Oocyte meiosis
hsa04218 hsa04218                                           Cellular senescence
hsa04061 hsa04061 Viral protein interaction with cytokine and cytokine receptor
hsa03320 hsa03320                                        PPAR signaling pathway
hsa04814 hsa04814                                                Motor proteins
         GeneRatio  BgRatio       pvalue     p.adjust       qvalue
hsa04110    15/106 157/8465 8.177242e-10 1.717221e-07 1.695702e-07
hsa04114    10/106 131/8465 5.049610e-06 5.302091e-04 5.235648e-04
hsa04218    10/106 156/8465 2.366003e-05 1.639157e-03 1.618617e-03
hsa04061     8/106 100/8465 3.326461e-05 1.639157e-03 1.618617e-03
hsa03320     7/106  75/8465 3.902756e-05 1.639157e-03 1.618617e-03
hsa04814    10/106 193/8465 1.433387e-04 5.016856e-03 4.953988e-03
                                                                           geneID
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232
hsa04114                          991/9133/983/4085/51806/6790/891/9232/3708/5241
hsa04218                           2305/4605/9133/890/983/51806/1111/891/776/3708
hsa04061                                 3627/10563/6373/4283/6362/6355/9547/1524
hsa03320                                       4312/9415/9370/5105/2167/3158/5346
hsa04814                   9493/1062/81930/3832/3833/146909/10112/24137/4629/7802
         Count
hsa04110    15
hsa04114    10
hsa04218    10
hsa04061     8
hsa03320     7
hsa04814    10
> 
> ## version packages
> packageVersion("clusterProfiler")
[1] ‘4.8.1’
> 
> sessionInfo()
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Europe/Amsterdam
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] clusterProfiler_4.8.1

loaded via a namespace (and not attached):
  [1] DBI_1.1.3               bitops_1.0-7            gson_0.1.0             
  [4] shadowtext_0.1.2        gridExtra_2.3           rlang_1.1.1            
  [7] magrittr_2.0.3          DOSE_3.26.1             compiler_4.3.0         
 [10] RSQLite_2.3.1           png_0.1-8               vctrs_0.6.3            
 [13] reshape2_1.4.4          stringr_1.5.0           pkgconfig_2.0.3        
 [16] crayon_1.5.2            fastmap_1.1.1           XVector_0.40.0         
 [19] ggraph_2.1.0            utf8_1.2.3              HDO.db_0.99.1          
 [22] enrichplot_1.20.0       purrr_1.0.1             bit_4.0.5              
 [25] zlibbioc_1.46.0         cachem_1.0.8            aplot_0.1.10           
 [28] GenomeInfoDb_1.36.0     jsonlite_1.8.5          blob_1.2.4             
 [31] BiocParallel_1.34.2     tweenr_2.0.2            parallel_4.3.0         
 [34] R6_2.5.1                stringi_1.7.12          RColorBrewer_1.1-3     
 [37] GOSemSim_2.26.0         Rcpp_1.0.10             downloader_0.4         
 [40] IRanges_2.34.0          Matrix_1.5-4.1          splines_4.3.0          
 [43] igraph_1.5.0            tidyselect_1.2.0        qvalue_2.32.0          
 [46] viridis_0.6.3           codetools_0.2-19        lattice_0.21-8         
 [49] tibble_3.2.1            plyr_1.8.8              Biobase_2.60.0         
 [52] treeio_1.24.1           withr_2.5.0             KEGGREST_1.40.0        
 [55] gridGraphics_0.5-1      scatterpie_0.2.1        polyclip_1.10-4        
 [58] Biostrings_2.68.1       pillar_1.9.0            ggtree_3.8.0           
 [61] stats4_4.3.0            ggfun_0.0.9             generics_0.1.3         
 [64] RCurl_1.98-1.12         S4Vectors_0.38.1        ggplot2_3.4.2          
 [67] munsell_0.5.0           scales_1.2.1            tidytree_0.4.2         
 [70] glue_1.6.2              lazyeval_0.2.2          tools_4.3.0            
 [73] data.table_1.14.8       fgsea_1.26.0            graphlayouts_1.0.0     
 [76] fastmatch_1.1-3         tidygraph_1.2.3         cowplot_1.1.1          
 [79] grid_4.3.0              tidyr_1.3.0             ape_5.7-1              
 [82] AnnotationDbi_1.62.1    colorspace_2.1-0        nlme_3.1-162           
 [85] GenomeInfoDbData_1.2.10 patchwork_1.1.2         ggforce_0.4.1          
 [88] cli_3.6.1               fansi_1.0.4             viridisLite_0.4.2      
 [91] dplyr_1.1.2             gtable_0.3.3            yulab.utils_0.0.6      
 [94] digest_0.6.31           BiocGenerics_0.46.0     ggrepel_0.9.3          
 [97] ggplotify_0.1.0         farver_2.1.1            memoise_2.0.1          
[100] lifecycle_1.0.3         httr_1.4.6              GO.db_3.17.0           
[103] bit64_4.0.5             MASS_7.3-60            
> 
> 
Tsebifera commented 1 year ago

请确保你输入的gene是ENTREZID,并且使用最新版本的clusterProfiler 并没有用并且更新到最新也不行