I have been using this wonderful package for a few years and it works great for my analysis of species like mouse and human data. Recently, I am working on the Gallus (chicken) and met some issues with enrichGo.

As shown below, the number of background genes for each term are all the same which should be different. I am using the latest versions of R and clusterProfiler. Would you please let me know how can I fix this issue?



I found a similar issue #527. The background gene for GeneRatio for each term should be different however, they are all 100, used in ClusterProfiler. Are there any issue with the new version of clusterprofiler?

Could you please provide some reproducible code? How does your input look like? Do you analyze all GO categories, or only a subset? Etc...

> library(clusterProfiler)
> data(geneList, package = "DOSE")
> de <- names(geneList)[1:750]
> yy <- enrichGO(de, 'org.Hs.eg.db', ont="BP", pvalueCutoff=1)

> dim(yy)
[1] 778   9
> as.data.frame(yy)[1:25,1:4]
> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] clusterProfiler_4.6.1

loaded via a namespace (and not attached):
  [1] nlme_3.1-162           bitops_1.0-7           ggtree_3.6.2          
  [4] enrichplot_1.18.3      bit64_4.0.5            HDO.db_0.99.1         
  [7] RColorBrewer_1.1-3     httr_1.4.5             GenomeInfoDb_1.34.9   
 [10] tools_4.2.2            utf8_1.2.3             R6_2.5.1              
 [13] lazyeval_0.2.2         DBI_1.1.3              BiocGenerics_0.44.0   
 [16] colorspace_2.1-0       withr_2.5.0            tidyselect_1.2.0      
 [19] gridExtra_2.3          bit_4.0.5              compiler_4.2.2        
 [22] cli_3.6.0              Biobase_2.58.0         scatterpie_0.1.8      
 [25] shadowtext_0.1.2       scales_1.2.1           stringr_1.5.0         
 [28] digest_0.6.31          yulab.utils_0.0.6      gson_0.0.9            
 [31] DOSE_3.24.2            XVector_0.38.0         pkgconfig_2.0.3       
 [34] fastmap_1.1.1          rlang_1.0.6            RSQLite_2.3.0         
 [37] gridGraphics_0.5-1     farver_2.1.1           generics_0.1.3        
 [40] jsonlite_1.8.4         BiocParallel_1.32.5    GOSemSim_2.24.0       
 [43] dplyr_1.1.0            RCurl_1.98-1.10        magrittr_2.0.3        
 [46] ggplotify_0.1.0        GO.db_3.16.0           GenomeInfoDbData_1.2.9
 [49] patchwork_1.1.2        Matrix_1.5-3           Rcpp_1.0.10           
 [52] munsell_0.5.0          S4Vectors_0.36.2       fansi_1.0.4           
 [55] ape_5.7                viridis_0.6.2          lifecycle_1.0.3       
 [58] stringi_1.7.12         ggraph_2.1.0           MASS_7.3-58.2         
 [61] zlibbioc_1.44.0        org.Hs.eg.db_3.16.0    plyr_1.8.8            
 [64] qvalue_2.30.0          grid_4.2.2             blob_1.2.3            
 [67] parallel_4.2.2         ggrepel_0.9.3          crayon_1.5.2          
 [70] lattice_0.20-45        graphlayouts_0.8.4     Biostrings_2.66.0     
 [73] cowplot_1.1.1          splines_4.2.2          KEGGREST_1.38.0       
 [76] pillar_1.8.1           fgsea_1.24.0           igraph_1.4.1          
 [79] reshape2_1.4.4         codetools_0.2-19       stats4_4.2.2          
 [82] fastmatch_1.1-3        glue_1.6.2             ggfun_0.0.9           
 [85] downloader_0.4         data.table_1.14.8      treeio_1.22.0         
 [88] png_0.1-8              vctrs_0.5.2            tweenr_2.0.2          
 [91] gtable_0.3.1           purrr_1.0.1            polyclip_1.10-4       
 [94] tidyr_1.3.0            cachem_1.0.7           ggplot2_3.4.1         
 [97] ggforce_0.4.1          tidygraph_1.2.3        tidytree_0.4.2        
[100] viridisLite_0.4.1      tibble_3.1.8           aplot_0.1.9           
[103] AnnotationDbi_1.60.0   memoise_2.0.1          IRanges_2.32.0        
thanks for your response, but probably, I misunderstood the calculation of GeneRatio M/N. N is the total number of genes detected in the gene list not of a specific gene set, right? If so, the N is the number of genes provided in our gene list. It should be the same.

Then my issue is the number is quite low than what I provided. I provided more than 600 genes and the number is only 100. The number of background genes in BgRatios is also quite low, only about 2000.

I tried Panther and it provides more genes than ClusterProfiler. Please see below image

Aha, I stand corrected!

You are right regarding the calculation of the gene ratio! I got confused, and edited my answer above.

Thus (in the example code from my post above) : 720 equals the number of genes from the provided list of 750 selected genes that could be annotated to any of the GO-BP categories (=denominator), and this value should indeed be the same for all categories. The numerator equals the number of genes in the provided list that have been annotated to a specific GO-BP category.

This ratio is then compared to the ratio of the whole GO-BP to check for statistical significant overrepresentation of a GO-BP category in the list of selected genes.

Thanks, but do you know why

1) I provided more than 600 genes in my gene list and only 100 of them were used in the GeneRatio. It seems like it also happened to https://github.com/YuLab-SMU/clusterProfiler/issues/527 (98 or 100 genes). 2) The number of background genes in BgRatios is also quite low, only about 2000. You have about 18903 genes in the above data for human.

Are there any issues with enrichGO or it could be an issue with Org.db of chicken?