YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1.02k stars 256 forks source link

Differences between clusterProfiler and agriGO #171

Open xie186 opened 5 years ago

xie186 commented 5 years ago

Hi Guangchuang,

When I'm using clusterprofiler, I got different results compared to agriGO (http://bioinfo.cau.edu.cn/agriGO/analysis.php) using the same set of genes.

For example GO:0006412, in the clusterprofiler result 94 out of 2691 belong to this term. In agriGO results, 170 out of 2614 belong to this term. I'm wondering whether you have any idea why this happens? Thanks. See the results below: test_agriGO_results.txt test_clusterprofier.BP.txt

I attached the files and code, as well as the session information here.

Files used clusterprofiler. gene.list.txt test.BP.GENE.txt test.BP.NAME.txt

Files for agriGO: test4agrigo.txt forTest.gitb.BP.txt

library(clusterProfiler)
PREFIX="test"
term2GENE <- paste(PREFIX, i, "GENE.txt", sep = ".")
term2NAME <- paste(PREFIX, i, "NAME.txt", sep = ".")
cat(term2GENE, "\n")
cat(term2NAME, "\n")
term2gene <- read.table(term2GENE, sep = "\t")
term2name <- read.delim(term2NAME, sep = "\t")
gene_list = read.table("gene.list.txt")
geneid_sig = as.vector(gene_list$V1)
res <- enricher(geneid_sig, TERM2GENE=term2gene, TERM2NAME=term2name,
                pvalueCutoff = 1, qvalueCutoff = 1)
res_df <- as.data.frame(res)
write.table(res_df, "test_clusterprofier.BP.txt", sep="\t", quote=F, col.names = T, row.names=F)
sessionInfo()
 sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocInstaller_1.30.0   gridExtra_2.3          ggplot2_3.1.0          clusterProfiler_3.10.0

loaded via a namespace (and not attached):
 [1] Biobase_2.42.0       viridis_0.5.1        httr_1.3.1           tidyr_0.8.2          bit64_0.9-7         
 [6] jsonlite_1.5         viridisLite_0.3.0    splines_3.5.1        ggraph_1.0.2         assertthat_0.2.0    
[11] DO.db_2.9            rvcheck_0.1.1        triebeard_0.3.0      urltools_1.7.1       stats4_3.5.1        
[16] blob_1.1.1           progress_1.2.0       ggrepel_0.8.0        pillar_1.3.0         RSQLite_2.1.1       
[21] lattice_0.20-35      glue_1.3.0           digest_0.6.18        RColorBrewer_1.1-2   qvalue_2.14.0       
[26] colorspace_1.3-2     cowplot_0.9.3        Matrix_1.2-14        plyr_1.8.4           pkgconfig_2.0.2     
[31] purrr_0.2.5          GO.db_3.7.0          scales_1.0.0         ggplotify_0.0.3      europepmc_0.3       
[36] tweenr_1.0.0         enrichplot_1.2.0     BiocParallel_1.16.0  ggforce_0.1.3        tibble_1.4.2        
[41] farver_1.0           IRanges_2.16.0       withr_2.1.2          UpSetR_1.3.3         BiocGenerics_0.28.0 
[46] lazyeval_0.2.1       magrittr_1.5         crayon_1.3.4         memoise_1.1.0        DOSE_3.8.0          
[51] MASS_7.3-50          xml2_1.2.0           prettyunits_1.0.2    tools_3.5.1          data.table_1.11.8   
[56] hms_0.4.2            stringr_1.3.1        S4Vectors_0.20.0     munsell_0.5.0        AnnotationDbi_1.44.0
[61] bindrcpp_0.2.2       compiler_3.5.1       gridGraphics_0.3-0   rlang_0.3.0.1        ggridges_0.5.1      
[66] units_0.6-1          igraph_1.2.2         labeling_0.3         gtable_0.2.0         DBI_1.0.0           
[71] reshape2_1.4.3       R6_2.3.0             dplyr_0.7.7          bit_1.1-14           bindr_0.1.1         
[76] fastmatch_1.1-0      fgsea_1.8.0          GOSemSim_2.8.0       stringi_1.2.4        parallel_3.5.1      
[81] Rcpp_0.12.19         tidyselect_0.2.5    
GuangchuangYu commented 5 years ago

For example GO:0006412, in the clusterprofiler result 94 out of 2691 belong to this term. In agriGO results, 170 out of 2614 belong to this term

filter out those genes reported in agriGO 170 that are not in the clusterProfiler 94, and then check them on your test.BP.GENE.txt file. If they are not exists (they should), how can you expect clusterProfiler to report them?