YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
998 stars 252 forks source link

enrichKegg Bug #646

Open zellerivo opened 9 months ago

zellerivo commented 9 months ago

Dear Prof. Guangchuang Yu,

I am an avid user of your package and I want to express my sincere appreciation for all the work and effort you put into it. I particularly like you're creativity when it comes to visualisation and the ease of use of your packages. So thank you very much for that !

The enrichKegg function does not work on my system. It should be something with the USER_DATA object according to my debugging observation.

Example: data(geneList, package='DOSE') de <- names(geneList)[1:100] yy <- enrichKEGG(de, pvalueCutoff=0.01) head(yy)

It throws me the following error: image

Junyan1996 commented 9 months ago

you can try pvalueCutoff=1

guidohooiveld commented 9 months ago

You should provide more information on your R/Bioconductor installation! Are you sure it is up-to-date? That is, using R-4.3.x and Bioconductor 3.18? There have been changes in the KEGG API the last year, and this may explain why it doesn't work for you anymore... It does for me, using the current versions of R/Bioconductor....!

> library(clusterProfiler)
> data(geneList, package='DOSE')
> de <- names(geneList)[1:100]
> yy <- enrichKEGG(de, pvalueCutoff=0.01)
Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
> head(yy)
                   category           subcategory       ID
hsa04110 Cellular Processes Cell growth and death hsa04110
hsa04218 Cellular Processes Cell growth and death hsa04218
hsa04114 Cellular Processes Cell growth and death hsa04114
hsa04814 Cellular Processes         Cell motility hsa04814
hsa04657 Organismal Systems         Immune system hsa04657
                     Description GeneRatio  BgRatio       pvalue     p.adjust
hsa04110              Cell cycle     12/58 157/8644 3.667200e-10 4.547329e-08
hsa04218     Cellular senescence      7/58 156/8644 7.570813e-05 4.693904e-03
hsa04114          Oocyte meiosis      6/58 131/8644 2.292076e-04 8.823322e-03
hsa04814          Motor proteins      7/58 193/8644 2.846233e-04 8.823322e-03
hsa04657 IL-17 signaling pathway      5/58  94/8644 3.972218e-04 9.851100e-03
               qvalue
hsa04110 4.207630e-08
hsa04218 4.343256e-03
hsa04114 8.164194e-03
hsa04814 8.164194e-03
hsa04657 9.115195e-03
                                                             geneID Count
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319    12
hsa04218                          2305/4605/9133/890/983/51806/1111     7
hsa04114                               991/9133/983/4085/51806/6790     6
hsa04814                     9493/1062/81930/3832/3833/146909/10112     7
hsa04657                                   4312/6280/6279/6278/3627     5
> 
> packageVersion("clusterProfiler")
[1] ‘4.10.0’
> BiocManager::version()
[1] ‘3.18’
> R.Version()$version.string 
[1] "R version 4.3.0 (2023-04-21 ucrt)"
>
zellerivo commented 8 months ago

thanks for your answers. Setting a higher pvalue threshold still yields no enriched KEGG Terms, enrichGO works normally.


> packageVersion("clusterProfiler")
[1] ‘4.4.4’
> BiocManager::version()
[1] ‘3.15’
> R.Version()$version.string
[1] "R version 4.2.3 (2023-03-15 ucrt)"
guidohooiveld commented 8 months ago

As said, AFAIK recently (a couple of months ago) there have been some issues with connecting to the KEGG API. These have been addressed, so I strongly recommend you update your R/Bioconductor/clusterProfiler installations to the latest ones.

Based on the behavior you experience (GO analysis is working, KEGG is not), it seems it is specific to KEGG, and this can only be the step in which the gene sets are retrieved (because under the hood enrichGO and enrichKEGG converge to the same internal function).

zellerivo commented 8 months ago

I did some updating:

> packageVersion("clusterProfiler")
[1] ‘4.10.0’
> BiocManager::version()
[1] ‘3.18’
> R.Version()$version.string
[1] "R version 4.3.2 (2023-10-31 ucrt)"

However, the problem persists. From debugging, it seems to me that there is something going wrong when building the KEGG_DATA object. The path2gene that goes intobuild_anno appears to be empty, but the path2name parameter looks legit.

zellerivo commented 8 months ago

This work around procedure fixed it for me: (https://github.com/YuLab-SMU/clusterProfiler/issues/561#issuecomment-1467266614)

guidohooiveld commented 8 months ago

Nice to hear it is working for you now!, but... the KEGG_DATA object would normally not be required, except if there are problems connecting to the online KEGG site/database.

Could you therefore, in a fresh session of R, run the code in the 3rd post, and paste the full code and output here?

zellerivo commented 8 months ago

still, the same error. I don't get how you get the prompt

Reading KEGG annotation online:

It should be from this function clusterProfiler:::kegg_rest. Where does the call happen? I don't see it in download_KEGG