YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
967 stars 246 forks source link

something wrong of setReadable #683

Open williambi1 opened 2 months ago

williambi1 commented 2 months ago

老师您好 在运行setReadable时遇到了报错现象

代码如下:

GSEA_KEGG.R1<-DOSE::setReadable(GSEA_KEGG,OrgDb = 'org.Hs.eg.db',keyType = 'ENTREZID') 报错如下: 错误: near "7006": syntax error

在此处可以正常运行:

KEGG<-enrichKEGG(En_id, #KEGG在线富集分析 organism=KEGG_database, pvalueCutoff = 0.05, qvalueCutoff = 0.05) KEGG<-DOSE::setReadable(KEGG,OrgDb` = "org.Hs.eg.db",keyType = 'ENTREZID') #id转回symbol

在此处不能正常运行:

GSEA_KEGG<-gseKEGG( GSEAgenelist.KEGG, organism = 'hsa', pvalueCutoff = 0.05, verbose = T, ) GSEA_KEGG.R1<-DOSE::setReadable(GSEA_KEGG,OrgDb = 'org.Hs.eg.db',keyType = 'ENTREZID') #id转回symbol

guidohooiveld commented 2 months ago

What is the outcome of / type: GSEA_KEGG and as.data.frame(GSEA_KEGG)[1:5,] ?

williambi1 commented 2 months ago

@guidohooiveld c1335c7117895ad2af7e5361173fe63 d0ebd1a15241203a73cc08c97e24a9c

guidohooiveld commented 2 months ago

So (but correct me if I am wrong):

You work with a set of human genes, and DOSE::setReadable works when using the input from a KEGG-based overrepresentation analysis (ORA; using function enrichKEGG), but it fails when trying to convert results from a gene set enrichment analysis (GSEA; function gseKEGG).

Based on your screenshot is is clear that your results file GSEA_KEGG, i.e. the file before conversion, looks as expected!

Your ids are indeed human entrez ids.

So... to be honest, I don't know what is causing the issue...

Yet, please try to run my code below in a fresh R-session. If that fails, it is something specific to your installation, or your R-session. I guess that is the case. Moreover, it seems that your are not using the latest version of clusterProfiler! In case the example code gives a problem, I strongly recommend to update to the latest versions of R and Bioconductor/clusterProfiler. Note that a new release of Bioconductor is planned for May 1st, 2024. See: https://bioconductor.org/developers/release-schedule/

> library(clusterProfiler)
> library(enrichplot)
>  
> library(org.Hs.eg.db)
> 
> data(geneList, package="DOSE") ## load example data
> 
> ## copy/pasting your exact code
> GSEA_KEGG <- gseKEGG(geneList,
+                      organism = 'hsa',
+                      pvalueCutoff = 0.05,
+                      verbose = T,
+                      )
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
Warning message:
In fgseaMultilevel(pathways = pathways, stats = stats, minSize = minSize,  :
  For some pathways, in reality P-values are less than 1e-10. You can set the `eps` argument to zero for better estimation.
> 
> 
> ## check
> GSEA_KEGG
#
# Gene Set Enrichment Analysis
#
#...@organism    hsa 
#...@setType     KEGG 
#...@keytype     kegg 
#...@geneList    Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
 - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
#...nPerm        
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...62 enriched terms found
'data.frame':   62 obs. of  11 variables:
 $ ID             : chr  "hsa04110" "hsa03050" "hsa04657" "hsa05169" ...
 $ Description    : chr  "Cell cycle" "Proteasome" "IL-17 signaling pathway" "Epstein-Barr virus infection" ...
 $ setSize        : int  139 43 85 193 33 62 130 86 202 55 ...
 $ enrichmentScore: num  0.664 0.709 0.562 0.434 0.723 ...
 $ NES            : num  2.87 2.48 2.25 1.94 2.4 ...
 $ pvalue         : num  1.00e-10 1.05e-08 8.16e-08 1.06e-07 3.01e-07 ...
 $ p.adjust       : num  3.41e-08 1.79e-06 9.05e-06 9.05e-06 2.05e-05 ...
 $ qvalue         : num  2.48e-08 1.30e-06 6.59e-06 6.59e-06 1.50e-05 ...
 $ rank           : num  1155 2516 2880 2820 1905 ...
 $ leading_edge   : chr  "tags=36%, list=9%, signal=33%" "tags=65%, list=20%, signal=52%" "tags=49%, list=23%, signal=38%" "tags=39%, list=23%, signal=31%" ...
 $ core_enrichment: chr  "8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4"| __truncated__ "5688/5709/5698/5693/3458/5713/11047/5721/5691/5685/5690/5684/5686/5695/10213/23198/7979/5699/5714/5702/5708/569"| __truncated__ "4312/6280/6279/6278/3627/2921/6364/8061/4318/3576/3934/6347/727897/1051/6354/3458/6361/6374/2919/9618/5603/7128"| __truncated__ "3627/890/6890/9636/898/9134/6502/6772/3126/3112/4609/917/5709/1869/3654/919/915/4067/4938/864/4940/5713/5336/11"| __truncated__ ...
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> as.data.frame(GSEA_KEGG)[1:5,]
               ID                  Description setSize enrichmentScore      NES
hsa04110 hsa04110                   Cell cycle     139       0.6637551 2.866380
hsa03050 hsa03050                   Proteasome      43       0.7094784 2.481016
hsa04657 hsa04657      IL-17 signaling pathway      85       0.5622094 2.253750
hsa05169 hsa05169 Epstein-Barr virus infection     193       0.4335010 1.941121
hsa03030 hsa03030              DNA replication      33       0.7227680 2.397985
               pvalue     p.adjust       qvalue rank
hsa04110 1.000000e-10 3.410000e-08 2.484211e-08 1155
hsa03050 1.049737e-08 1.789801e-06 1.303883e-06 2516
hsa04657 8.156045e-08 9.051761e-06 6.594275e-06 2880
hsa05169 1.061790e-07 9.051761e-06 6.594275e-06 2820
hsa03030 3.010166e-07 2.052933e-05 1.495577e-05 1905
                           leading_edge
hsa04110  tags=36%, list=9%, signal=33%
hsa03050 tags=65%, list=20%, signal=52%
hsa04657 tags=49%, list=23%, signal=38%
hsa05169 tags=39%, list=23%, signal=31%
hsa03030 tags=64%, list=15%, signal=54%
                                                                                                                                                                                                                                                                                                                                                                               core_enrichment
hsa04110                                                                                                                                  8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4998/9134/4175/4173/10926/6502/994/699/4609/5111/26271/1869/1029/8317/4176/2810/3066/1871/1031/9088/995/1019/4172/5885/11200/7027/1875
hsa03050                                                                                                                                                                                                                                       5688/5709/5698/5693/3458/5713/11047/5721/5691/5685/5690/5684/5686/5695/10213/23198/7979/5699/5714/5702/5708/5692/5704/5683/5694/5718/51371/5682
hsa04657                                                                                                                                                                 4312/6280/6279/6278/3627/2921/6364/8061/4318/3576/3934/6347/727897/1051/6354/3458/6361/6374/2919/9618/5603/7128/1994/7124/3569/8772/5743/7186/3596/6356/5594/4792/9641/1147/2932/6300/5597/27190/1432/7184/64806/3326
hsa05169 3627/890/6890/9636/898/9134/6502/6772/3126/3112/4609/917/5709/1869/3654/919/915/4067/4938/864/4940/5713/5336/11047/3066/54205/1871/578/1019/637/916/3383/4939/10213/23586/4793/5603/7979/7128/6891/930/5714/3452/6850/5702/4794/7124/3569/7097/5708/2208/8772/3119/5704/7186/5971/3135/1380/958/5610/4792/10018/8819/3134/10379/9641/1147/5718/6300/3109/811/5606/2923/3108/5707/1432
hsa03030                                                                                                                                                                                                                                                                           4174/4171/4175/4173/2237/5984/5111/10535/1763/5427/23649/4176/5982/5557/5558/4172/5424/5983/5425/54107/6119
> 
> 
> ## convert entrezid into symbol
> ## WORKS!
> GSEA_KEGG.R1 <- DOSE::setReadable(GSEA_KEGG,
+                 OrgDb = 'org.Hs.eg.db',
+                 keyType = 'ENTREZID') 
> 
> # check
> GSEA_KEGG.R1
#
# Gene Set Enrichment Analysis
#
#...@organism    hsa 
#...@setType     KEGG 
#...@keytype     ENTREZID 
#...@geneList    Named num [1:12495] 4.57 4.51 4.42 4.14 3.88 ...
 - attr(*, "names")= chr [1:12495] "4312" "8318" "10874" "55143" ...
#...nPerm        
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...62 enriched terms found
'data.frame':   62 obs. of  11 variables:
 $ ID             : chr  "hsa04110" "hsa03050" "hsa04657" "hsa05169" ...
 $ Description    : chr  "Cell cycle" "Proteasome" "IL-17 signaling pathway" "Epstein-Barr virus infection" ...
 $ setSize        : int  139 43 85 193 33 62 130 86 202 55 ...
 $ enrichmentScore: num  0.664 0.709 0.562 0.434 0.723 ...
 $ NES            : num  2.87 2.48 2.25 1.94 2.4 ...
 $ pvalue         : num  1.00e-10 1.05e-08 8.16e-08 1.06e-07 3.01e-07 ...
 $ p.adjust       : num  3.41e-08 1.79e-06 9.05e-06 9.05e-06 2.05e-05 ...
 $ qvalue         : num  2.48e-08 1.30e-06 6.59e-06 6.59e-06 1.50e-05 ...
 $ rank           : num  1155 2516 2880 2820 1905 ...
 $ leading_edge   : chr  "tags=36%, list=9%, signal=33%" "tags=65%, list=20%, signal=52%" "tags=49%, list=23%, signal=38%" "tags=39%, list=23%, signal=31%" ...
 $ core_enrichment: chr  "CDC45/CDC20/CCNB2/NDC80/CCNA2/CDK1/MAD2L1/CDT1/TTK/AURKB/CHEK1/TRIP13/CCNB1/MCM5/PTTG1/MCM2/CDC25A/CDC6/PLK1/BU"| __truncated__ "PSMA7/PSMD3/PSMB9/PSMB5/IFNG/PSMD7/ADRM1/PSME2/PSMB3/PSMA4/PSMB2/PSMA3/PSMA5/PSMB7/PSMD14/PSME4/SEM1/PSMB10/PSM"| __truncated__ "MMP1/S100A9/S100A8/S100A7/CXCL10/CXCL3/CCL20/FOSL1/MMP9/CXCL8/LCN2/CCL2/MUC5B/CEBPB/CCL7/IFNG/CCL17/CXCL5/CXCL1"| __truncated__ "CXCL10/CCNA2/TAP1/ISG15/CCNE1/CCNE2/SKP2/STAT1/HLA-DRB4/HLA-DOB/MYC/CD3G/PSMD3/E2F1/IRAK1/CD247/CD3D/LYN/OAS1/R"| __truncated__ ...
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

> 
> as.data.frame(GSEA_KEGG.R1)[1:5,]
               ID                  Description setSize enrichmentScore      NES
hsa04110 hsa04110                   Cell cycle     139       0.6637551 2.866380
hsa03050 hsa03050                   Proteasome      43       0.7094784 2.481016
hsa04657 hsa04657      IL-17 signaling pathway      85       0.5622094 2.253750
hsa05169 hsa05169 Epstein-Barr virus infection     193       0.4335010 1.941121
hsa03030 hsa03030              DNA replication      33       0.7227680 2.397985
               pvalue     p.adjust       qvalue rank
hsa04110 1.000000e-10 3.410000e-08 2.484211e-08 1155
hsa03050 1.049737e-08 1.789801e-06 1.303883e-06 2516
hsa04657 8.156045e-08 9.051761e-06 6.594275e-06 2880
hsa05169 1.061790e-07 9.051761e-06 6.594275e-06 2820
hsa03030 3.010166e-07 2.052933e-05 1.495577e-05 1905
                           leading_edge
hsa04110  tags=36%, list=9%, signal=33%
hsa03050 tags=65%, list=20%, signal=52%
hsa04657 tags=49%, list=23%, signal=38%
hsa05169 tags=39%, list=23%, signal=31%
hsa03030 tags=64%, list=15%, signal=54%
                                                                                                                                                                                                                                                                                                                                                                                                                                                        core_enrichment
hsa04110                                                                                                                                                                       CDC45/CDC20/CCNB2/NDC80/CCNA2/CDK1/MAD2L1/CDT1/TTK/AURKB/CHEK1/TRIP13/CCNB1/MCM5/PTTG1/MCM2/CDC25A/CDC6/PLK1/BUB1B/ESPL1/CCNE1/ORC6/ORC1/CCNE2/MCM6/MCM4/DBF4/SKP2/CDC25B/BUB1/MYC/PCNA/FBXO5/E2F1/CDKN2A/CDC7/MCM7/SFN/HDAC2/E2F3/CDKN2C/PKMYT1/CDC25C/CDK4/MCM3/RAD21/CHEK2/TFDP1/E2F5
hsa03050                                                                                                                                                                                                                                                                                        PSMA7/PSMD3/PSMB9/PSMB5/IFNG/PSMD7/ADRM1/PSME2/PSMB3/PSMA4/PSMB2/PSMA3/PSMA5/PSMB7/PSMD14/PSME4/SEM1/PSMB10/PSMD8/PSMC3/PSMD2/PSMB4/PSMC4/PSMA2/PSMB6/PSMD12/POMP/PSMA1
hsa04657                                                                                                                                                                                                  MMP1/S100A9/S100A8/S100A7/CXCL10/CXCL3/CCL20/FOSL1/MMP9/CXCL8/LCN2/CCL2/MUC5B/CEBPB/CCL7/IFNG/CCL17/CXCL5/CXCL1/TRAF4/MAPK13/TNFAIP3/ELAVL1/TNF/IL6/FADD/PTGS2/TRAF2/IL13/CCL11/MAPK1/NFKBIA/IKBKE/CHUK/GSK3B/MAPK12/MAPK6/IL17B/MAPK14/HSP90B1/IL25/HSP90AB1
hsa05169 CXCL10/CCNA2/TAP1/ISG15/CCNE1/CCNE2/SKP2/STAT1/HLA-DRB4/HLA-DOB/MYC/CD3G/PSMD3/E2F1/IRAK1/CD247/CD3D/LYN/OAS1/RUNX3/OAS3/PSMD7/PLCG2/ADRM1/HDAC2/CYCS/E2F3/BAK1/CDK4/BID/CD3E/ICAM1/OAS2/PSMD14/RIGI/NFKBIB/MAPK13/SEM1/TNFAIP3/TAP2/CD19/PSMD8/IFNA21/SYK/PSMC3/NFKBIE/TNF/IL6/TLR2/PSMD2/FCER2/FADD/HLA-DQB1/PSMC4/TRAF2/RELB/HLA-G/CR2/CD40/EIF2AK2/NFKBIA/BCL2L11/SAP30/HLA-F/IRF9/IKBKE/CHUK/PSMD12/MAPK12/HLA-DMB/CALR/MAP2K3/PDIA3/HLA-DMA/PSMD1/MAPK14
hsa03030                                                                                                                                                                                                                                                                                                                                            MCM5/MCM2/MCM6/MCM4/FEN1/RFC4/PCNA/RNASEH2A/DNA2/POLE2/POLA2/MCM7/RFC2/PRIM1/PRIM2/MCM3/POLD1/RFC3/POLD2/POLE3/RPA3
> 
> packageVersion("clusterProfiler")
[1] ‘4.10.1’
> packageVersion("DOSE")
[1] ‘3.28.2’
>
williambi1 commented 2 months ago

@guidohooiveld Thank you bro.You are so great!I admire you very much.

williambi1 commented 2 months ago

@guidohooiveld I enchange the function calledmget tobitrto change SYMBOLID to ENTREZID and then fix the problem.

guidohooiveld commented 2 months ago

@williambi1 Good to hear you were able to solve your problem, but for the archive: could you please let us know whether the example I posted code did give you the anticipated results? Also, if you are willing to attach/upload your gene list (=input for gseKEGG), then this can be used for troubleshooting.