YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
967 stars 247 forks source link

enrichKEGG returns different results in different R sessions #146

Open YiweiNiu opened 6 years ago

YiweiNiu commented 6 years ago

Hi Guangchuang,

I found enrichKEGG returned different results in different R sessions. It's strange, and I don't know why.

Test 1, no enrichment was found:

> head(DE_genes2)
          ENSEMBL ENTREZID     SYMBOL
1 ENSG00000188976    26155      NOC2L
2 ENSG00000179840   644997 PIK3CD-AS1
3 ENSG00000054523    23095      KIF1B
4 ENSG00000011021     1185      CLCN6
5 ENSG00000080947   114819    CROCCP3
6 ENSG00000162542   255104      TMCO4
> dim(DE_genes2)
[1] 592   3
> kk2 = enrichKEGG(gene=DE_genes2[,2], organism='hsa', pvalueCutoff = 0.05)
> kk2
#
# over-representation test
#
#...@organism    hsa 
#...@ontology    KEGG 
#...@keytype     kegg 
#...@gene    chr [1:592] "26155" "644997" "23095" "1185" "114819" "255104" "8672" "2268" "3932" "79647" "23499" "643314" "10487" "5538" "60313" "55624" "8569" "148932" "2060" ...
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...0 enriched terms found
'data.frame':   0 obs. of  9 variables:
 $ ID         : chr 
 $ Description: chr 
 $ GeneRatio  : chr 
 $ BgRatio    : chr 
 $ pvalue     : num 
 $ p.adjust   : num 
 $ qvalue     : num 
 $ geneID     : chr 
 $ Count      : int 
#...Citation
  Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
  clusterProfiler: an R package for comparing biological themes among
  gene clusters. OMICS: A Journal of Integrative Biology
  2012, 16(5):284-287

Then I saved DE_genes2 into a new 'test.RData' file, and loaded into a new R session. It got 17 enriched iterms:

> kk2 = enrichKEGG(gene=DE_genes2[,2], organism='hsa', pvalueCutoff = 0.05)
> kk2
#
# over-representation test
#
#...@organism    hsa 
#...@ontology    KEGG 
#...@keytype     kegg 
#...@gene    chr [1:592] "26155" "644997" "23095" "1185" "114819" "255104" "8672" "2268" ...
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...17 enriched terms found
'data.frame':   17 obs. of  9 variables:
 $ ID         : chr  "hsa04658" "hsa04659" "hsa04660" "hsa05169" ...
 $ Description: chr  "Th1 and Th2 cell differentiation" "Th17 cell differentiation" "T cell receptor signaling pathway" "Epstein-Barr virus infection" ...
 $ GeneRatio  : chr  "17/244" "18/244" "15/244" "19/244" ...
 $ BgRatio    : chr  "92/7403" "107/7403" "101/7403" "203/7403" ...
 $ pvalue     : num  5.68e-09 9.53e-09 9.07e-07 3.66e-05 1.14e-04 ...
 $ p.adjust   : num  1.28e-06 1.28e-06 8.13e-05 2.46e-03 6.13e-03 ...
 $ qvalue     : num  1.07e-06 1.07e-06 6.78e-05 2.05e-03 5.11e-03 ...
 $ geneID     : chr  "3932/919/1147/84441/917/3566/3718/5534/7535/5335/4773/3560/3516/5602/55534/3123/1432" "3932/919/1147/917/4088/3566/50615/6774/3718/5534/7535/5335/4773/3560/3556/5602/3123/1432" "3932/10451/919/1147/917/10125/5534/926/7535/940/5335/4773/387/1432/84433" "2268/965/953/1147/6693/105369247/6774/3718/5610/1386/9759/5335/3588/171568/3516/5602/3123/1432/6850" ...
 $ Count      : int  17 18 15 19 14 13 13 11 10 6 ...
#...Citation
  Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
  clusterProfiler: an R package for comparing biological themes among
  gene clusters. OMICS: A Journal of Integrative Biology
  2012, 16(5):284-287

The two sessions had the same packages loaded:

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936 
[2] LC_CTYPE=Chinese (Simplified)_China.936   
[3] LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C                              
[5] LC_TIME=Chinese (Simplified)_China.936    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] org.Hs.eg.db_3.5.0    AnnotationDbi_1.40.0  IRanges_2.12.0       
 [4] S4Vectors_0.16.0      Biobase_2.38.0        BiocGenerics_0.24.0  
 [7] clusterProfiler_3.7.1 edgeR_3.20.9          limma_3.34.9         
[10] ImpulseDE2_1.2.0     

loaded via a namespace (and not attached):
  [1] fgsea_1.4.1                colorspace_1.3-2           rjson_0.2.18              
  [4] ggridges_0.5.0             circlize_0.4.3             qvalue_2.10.0             
  [7] htmlTable_1.11.2           XVector_0.18.0             GenomicRanges_1.30.3      
 [10] GlobalOptions_0.0.13       base64enc_0.1-3            rstudioapi_0.7            
 [13] ggrepel_0.8.0              bit64_0.9-7                splines_3.4.1             
 [16] GOSemSim_2.4.1             geneplotter_1.56.0         knitr_1.20                
 [19] Formula_1.2-3              annotate_1.56.2            cluster_2.0.7-1           
 [22] GO.db_3.5.0                ggforce_0.1.1              compiler_3.4.1            
 [25] rvcheck_0.0.9              backports_1.1.2            assertthat_0.2.0          
 [28] Matrix_1.2-14              lazyeval_0.2.1             tweenr_0.1.5              
 [31] acepack_1.4.1              htmltools_0.3.6            tools_3.4.1               
 [34] bindrcpp_0.2.2             igraph_1.2.1               gtable_0.2.0              
 [37] glue_1.2.0                 GenomeInfoDbData_1.0.0     reshape2_1.4.3            
 [40] DO.db_2.9                  dplyr_0.7.4                fastmatch_1.1-0           
 [43] Rcpp_0.12.16               enrichplot_0.99.14         udunits2_0.13             
 [46] ggraph_1.0.1               stringr_1.3.1              XML_3.98-1.11             
 [49] DOSE_3.5.1                 zlibbioc_1.24.0            MASS_7.3-50               
 [52] scales_0.5.0               SummarizedExperiment_1.8.1 RColorBrewer_1.1-2        
 [55] ComplexHeatmap_1.17.1      yaml_2.1.19                memoise_1.1.0             
 [58] gridExtra_2.3              ggplot2_2.2.1.9000         UpSetR_1.3.3              
 [61] rpart_4.1-13               latticeExtra_0.6-28        stringi_1.1.7             
 [64] RSQLite_2.1.1              genefilter_1.60.0          checkmate_1.8.5           
 [67] BiocParallel_1.12.0        shape_1.4.4                GenomeInfoDb_1.14.0       
 [70] rlang_0.2.0                pkgconfig_2.0.1            matrixStats_0.53.1        
 [73] bitops_1.0-6               lattice_0.20-35            purrr_0.2.4               
 [76] bindr_0.1.1                htmlwidgets_1.2            cowplot_0.9.2             
 [79] bit_1.1-12                 plyr_1.8.4                 magrittr_1.5              
 [82] DESeq2_1.18.1              R6_2.2.2                   Hmisc_4.1-1               
 [85] DelayedArray_0.4.1         DBI_1.0.0                  pillar_1.2.2              
 [88] foreign_0.8-70             units_0.5-1                survival_2.42-3           
 [91] RCurl_1.95-4.10            nnet_7.3-12                tibble_1.4.2              
 [94] viridis_0.5.1              GetoptLong_0.1.6           locfit_1.5-9.1            
 [97] grid_3.4.1                 data.table_1.11.2          blob_1.1.1                
[100] digest_0.6.15              xtable_1.8-2               tidyr_0.8.0               
[103] munsell_0.4.3              viridisLite_0.3.0

Thank you in advance!

GuangchuangYu commented 6 years ago

the first one maybe caused by internet issue, as the annotation pathways not downloaded completely.

YiweiNiu commented 6 years ago

Thank you for your reply! Any solutions? Can I download the KEGG database and use it locally?

GuangchuangYu commented 6 years ago

this maybe a new feature in future release.

alexyfyf commented 4 years ago

Hi @GuangchuangYu

I had the same issue. It occurred on the same machine, with different sessions, enrichKEGG returned slightly different number of genes matched and pathway enriched, I'm wondering if you have any solution now.

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /mnt/software/apps/R/3.6.0/lib/R/lib/libRblas.so
LAPACK: /mnt/software/apps/R/3.6.0/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C               LC_TIME=en_AU.UTF-8       
 [4] LC_COLLATE=en_AU.UTF-8     LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] clusterProfiler_3.14.3 forcats_0.5.0          stringr_1.4.0         
 [4] dplyr_0.8.5            purrr_0.3.4            readr_1.3.1           
 [7] tidyr_1.0.2            tibble_3.0.1           ggplot2_3.3.0         
[10] tidyverse_1.3.0        ChIPseeker_1.22.1      methylKit_1.12.0      
[13] GenomicRanges_1.38.0   GenomeInfoDb_1.22.1    IRanges_2.20.2        
[16] S4Vectors_0.24.4       BiocGenerics_0.32.0