Open Leonrunning opened 1 year ago
I found a similar issue #527. The background gene for GeneRatio for each term should be different however, they are all 100, used in ClusterProfiler. Are there any issue with the new version of clusterprofiler?
Could you please provide some reproducible code? How does your input look like? Do you analyze all GO categories, or only a subset? Etc...
@Leonrunning :
[[edited; removed my answer since I was wrong...]]
> library(clusterProfiler)
>
> data(geneList, package = "DOSE")
> de <- names(geneList)[1:750]
>
> yy <- enrichGO(de, 'org.Hs.eg.db', ont="BP", pvalueCutoff=1)
>
> dim(yy)
[1] 778 9
>
> as.data.frame(yy)[1:25,1:4]
ID
GO:0000070 GO:0000070
GO:0000819 GO:0000819
GO:0000280 GO:0000280
GO:0140014 GO:0140014
GO:0006261 GO:0006261
GO:0007059 GO:0007059
GO:0098813 GO:0098813
GO:1905818 GO:1905818
GO:0006260 GO:0006260
GO:0044786 GO:0044786
GO:0051983 GO:0051983
GO:0090329 GO:0090329
GO:0033046 GO:0033046
GO:0033048 GO:0033048
GO:2000816 GO:2000816
GO:0051985 GO:0051985
GO:1905819 GO:1905819
GO:0010965 GO:0010965
GO:0006268 GO:0006268
GO:0051304 GO:0051304
GO:0051306 GO:0051306
GO:0033047 GO:0033047
GO:0033260 GO:0033260
GO:0044772 GO:0044772
GO:0033045 GO:0033045
Description
GO:0000070 mitotic sister chromatid segregation
GO:0000819 sister chromatid segregation
GO:0000280 nuclear division
GO:0140014 mitotic nuclear division
GO:0006261 DNA-templated DNA replication
GO:0007059 chromosome segregation
GO:0098813 nuclear chromosome segregation
GO:1905818 regulation of chromosome separation
GO:0006260 DNA replication
GO:0044786 cell cycle DNA replication
GO:0051983 regulation of chromosome segregation
GO:0090329 regulation of DNA-templated DNA replication
GO:0033046 negative regulation of sister chromatid segregation
GO:0033048 negative regulation of mitotic sister chromatid segregation
GO:2000816 negative regulation of mitotic sister chromatid separation
GO:0051985 negative regulation of chromosome segregation
GO:1905819 negative regulation of chromosome separation
GO:0010965 regulation of mitotic sister chromatid separation
GO:0006268 DNA unwinding involved in DNA replication
GO:0051304 chromosome separation
GO:0051306 mitotic sister chromatid separation
GO:0033047 regulation of mitotic sister chromatid segregation
GO:0033260 nuclear DNA replication
GO:0044772 mitotic cell cycle phase transition
GO:0033045 regulation of sister chromatid segregation
GeneRatio BgRatio
GO:0000070 45/720 204/18903
GO:0000819 47/720 239/18903
GO:0000280 67/720 481/18903
GO:0140014 54/720 325/18903
GO:0006261 36/720 166/18903
GO:0007059 55/720 382/18903
GO:0098813 50/720 321/18903
GO:1905818 29/720 111/18903
GO:0006260 46/720 286/18903
GO:0044786 19/720 42/18903
GO:0051983 31/720 132/18903
GO:0090329 21/720 57/18903
GO:0033046 20/720 51/18903
GO:0033048 20/720 51/18903
GO:2000816 20/720 51/18903
GO:0051985 20/720 53/18903
GO:1905819 20/720 53/18903
GO:0010965 26/720 98/18903
GO:0006268 14/720 22/18903
GO:0051304 30/720 135/18903
GO:0051306 26/720 101/18903
GO:0033047 20/720 56/18903
GO:0033260 17/720 38/18903
GO:0044772 57/720 473/18903
GO:0033045 26/720 107/18903
>
>
> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] clusterProfiler_4.6.1
loaded via a namespace (and not attached):
[1] nlme_3.1-162 bitops_1.0-7 ggtree_3.6.2
[4] enrichplot_1.18.3 bit64_4.0.5 HDO.db_0.99.1
[7] RColorBrewer_1.1-3 httr_1.4.5 GenomeInfoDb_1.34.9
[10] tools_4.2.2 utf8_1.2.3 R6_2.5.1
[13] lazyeval_0.2.2 DBI_1.1.3 BiocGenerics_0.44.0
[16] colorspace_2.1-0 withr_2.5.0 tidyselect_1.2.0
[19] gridExtra_2.3 bit_4.0.5 compiler_4.2.2
[22] cli_3.6.0 Biobase_2.58.0 scatterpie_0.1.8
[25] shadowtext_0.1.2 scales_1.2.1 stringr_1.5.0
[28] digest_0.6.31 yulab.utils_0.0.6 gson_0.0.9
[31] DOSE_3.24.2 XVector_0.38.0 pkgconfig_2.0.3
[34] fastmap_1.1.1 rlang_1.0.6 RSQLite_2.3.0
[37] gridGraphics_0.5-1 farver_2.1.1 generics_0.1.3
[40] jsonlite_1.8.4 BiocParallel_1.32.5 GOSemSim_2.24.0
[43] dplyr_1.1.0 RCurl_1.98-1.10 magrittr_2.0.3
[46] ggplotify_0.1.0 GO.db_3.16.0 GenomeInfoDbData_1.2.9
[49] patchwork_1.1.2 Matrix_1.5-3 Rcpp_1.0.10
[52] munsell_0.5.0 S4Vectors_0.36.2 fansi_1.0.4
[55] ape_5.7 viridis_0.6.2 lifecycle_1.0.3
[58] stringi_1.7.12 ggraph_2.1.0 MASS_7.3-58.2
[61] zlibbioc_1.44.0 org.Hs.eg.db_3.16.0 plyr_1.8.8
[64] qvalue_2.30.0 grid_4.2.2 blob_1.2.3
[67] parallel_4.2.2 ggrepel_0.9.3 crayon_1.5.2
[70] lattice_0.20-45 graphlayouts_0.8.4 Biostrings_2.66.0
[73] cowplot_1.1.1 splines_4.2.2 KEGGREST_1.38.0
[76] pillar_1.8.1 fgsea_1.24.0 igraph_1.4.1
[79] reshape2_1.4.4 codetools_0.2-19 stats4_4.2.2
[82] fastmatch_1.1-3 glue_1.6.2 ggfun_0.0.9
[85] downloader_0.4 data.table_1.14.8 treeio_1.22.0
[88] png_0.1-8 vctrs_0.5.2 tweenr_2.0.2
[91] gtable_0.3.1 purrr_1.0.1 polyclip_1.10-4
[94] tidyr_1.3.0 cachem_1.0.7 ggplot2_3.4.1
[97] ggforce_0.4.1 tidygraph_1.2.3 tidytree_0.4.2
[100] viridisLite_0.4.1 tibble_3.1.8 aplot_0.1.9
[103] AnnotationDbi_1.60.0 memoise_2.0.1 IRanges_2.32.0
>
>
thanks for your response, but probably, I misunderstood the calculation of GeneRatio M/N. N is the total number of genes detected in the gene list not of a specific gene set, right? If so, the N is the number of genes provided in our gene list. It should be the same.
Then my issue is the number is quite low than what I provided. I provided more than 600 genes and the number is only 100. The number of background genes in BgRatios is also quite low, only about 2000.
I tried Panther and it provides more genes than ClusterProfiler. Please see below
Aha, I stand corrected!
You are right regarding the calculation of the gene ratio! I got confused, and edited my answer above.
Thus (in the example code from my post above) : 720 equals the number of genes from the provided list of 750 selected genes that could be annotated to any of the GO-BP categories (=denominator), and this value should indeed be the same for all categories. The numerator equals the number of genes in the provided list that have been annotated to a specific GO-BP category.
This ratio is then compared to the ratio of the whole GO-BP to check for statistical significant overrepresentation of a GO-BP category in the list of selected genes.
Thanks, but do you know why
1) I provided more than 600 genes in my gene list and only 100 of them were used in the GeneRatio. It seems like it also happened to https://github.com/YuLab-SMU/clusterProfiler/issues/527 (98 or 100 genes). 2) The number of background genes in BgRatios is also quite low, only about 2000. You have about 18903 genes in the above data for human.
Are there any issues with enrichGO or it could be an issue with Org.db of chicken?
Hi,
I have been using this wonderful package for a few years and it works great for my analysis of species like mouse and human data. Recently, I am working on the Gallus (chicken) and met some issues with enrichGo.
As shown below, the number of background genes for each term are all the same which should be different. I am using the latest versions of R and clusterProfiler. Would you please let me know how can I fix this issue?
thanks