Open shuaizh117 opened 2 months ago
I came across the same problem, does anyone know what's going on?
The enrichment result was got; but when I used the genes enriched to look up to their original GO ID, I just couldn't find the GO IDs that the software provided from the original GO IDs of the genes.
Hey bro,
https://www.ebi.ac.uk/QuickGO/. Maybe you can go to this website and search your GO IDs, the hitten GOID might be the parent term of the annotated GOID for your genes. (Although there is some exclusion).
enrichGO()
(and other GO-based analyses) uses the Gene Ontology information present in the Bioconductor package GO.db
. (here).
To see which genes have been annotated to a specific GO category in the GO.db
package you can for example do:
(note the use of keytype = "GOALL"
, and not (just) "GO"
!)
> library(org.Hs.eg.db)
> library(GO.db)
>
> ## query
> results <- AnnotationDbi::select(org.Hs.eg.db, keys=c("GO:0141091"), columns = c('ENTREZID','SYMBOL'), keytype = "GOALL")
'select()' returned 1:many mapping between keys and columns
>
> dim(results)
[1] 592 5
> head(results)
GOALL EVIDENCEALL ONTOLOGYALL ENTREZID SYMBOL
1 GO:0141091 IEA BP 25 ABL1
2 GO:0141091 IBA BP 90 ACVR1
3 GO:0141091 IDA BP 90 ACVR1
4 GO:0141091 IMP BP 90 ACVR1
5 GO:0141091 ISS BP 90 ACVR1
6 GO:0141091 IBA BP 91 ACVR1B
>
> ## remove duplicates
> results <- results[!duplicated(results[,"ENTREZID"]), ]
>
> dim(results)
[1] 388 5
> head(results)
GOALL EVIDENCEALL ONTOLOGYALL ENTREZID SYMBOL
1 GO:0141091 IEA BP 25 ABL1
2 GO:0141091 IBA BP 90 ACVR1
6 GO:0141091 IBA BP 91 ACVR1B
10 GO:0141091 IBA BP 92 ACVR2A
13 GO:0141091 IBA BP 93 ACVR2B
17 GO:0141091 IBA BP 94 ACVRL1
>
So 388 genes have been annotated to this particular GO category.
Manually confirm that all enriched genes are in these as well? Note that last gene is not! (I just added GAPDH
as example).
Answer: YES!
> enriched.genes <- c("HSPA1A","TNFAIP6","FST","ZBTB7A","HES1","ITGB8",
+ "TGFB2","MYOCD","LATS2","THBS1","GREM1","GAPDH")
> enriched.genes %in% results[,"SYMBOL"]
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
>
As said, enrichGO
uses GOALL
, so also taking into account the parent-child relationships, but maybe PANTHER does not? Therefore only directly annotated genes are retrieved?
In this context, see also this recent discussion: https://github.com/YuLab-SMU/clusterProfiler/issues/697#issuecomment-2189017067
Also note that although the last release of PANTHER (=19) is from June 2024, the underlying GO annotations used for this release seem to be from May 2023 (thus a year old). https://www.pantherdb.org/news/news20240620.jsp
Yet, the source of the GO.db
is2024-Mar12
.
> GO.db
GODb object:
| GOSOURCENAME: Gene Ontology
| GOSOURCEURL: http://current.geneontology.org/ontology/go-basic.obo
| GOSOURCEDATE: 2024-01-17
| Db type: GODb
| package: AnnotationDbi
| DBSCHEMA: GO_DB
| GOEGSOURCEDATE: 2024-Mar12
| GOEGSOURCENAME: Entrez Gene
| GOEGSOURCEURL: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA
| DBSCHEMAVERSION: 2.1
Please see: help('select') for usage information
>
Both could explain your observation.
I was confused by GO terms popped out in my GO analysis using enrichGO. For example, "GO:0141091: transforming growth factor beta receptor superfamily signaling pathway" was reported as one of the top overrepresented GO in my sig DEGs. And below is the gene list "HSPA1A/TNFAIP6/FST/ZBTB7A/HES1/ITGB8/TGFB2/MYOCD/LATS2/THBS1/GREM1" enriched to this term. However, when I refer to PANTHER regarding GO:0141091, I only found TGFB2 in this GO term. If the other genes do not belong to this GO:0141091, then how did this GO get enriched?? I think I'm still not very clear about the behind scene story of how enrichGO perform the calculation.
Matrix products: default
locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: America/Chicago tzcode source: internal
attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base
other attached packages: [1] viridis_0.6.5 viridisLite_0.4.2
[3] topGO_2.56.0 SparseM_1.81
[5] GO.db_3.19.1 graph_1.82.0
[7] pathview_1.44.0 enrichplot_1.24.0
[9] clusterProfiler_4.12.3 ggpubr_0.6.0
[11] corrplot_0.92 annotables_0.2.0
[13] RColorBrewer_1.1-3 ggrepel_0.9.5
[15] pheatmap_1.0.12 lubridate_1.9.3
[17] forcats_1.0.0 stringr_1.5.1
[19] dplyr_1.1.4 purrr_1.0.2
[21] readr_2.1.5 tidyr_1.3.1
[23] tibble_3.2.1 tidyverse_2.0.0
[25] ggplot2_3.5.1 org.Hs.eg.db_3.19.1
[27] AnnotationDbi_1.66.0 DESeq2_1.44.0
[29] SummarizedExperiment_1.34.0 Biobase_2.64.0
[31] MatrixGenerics_1.16.0 matrixStats_1.3.0
[33] GenomicRanges_1.56.0 GenomeInfoDb_1.40.0
[35] IRanges_2.38.0 S4Vectors_0.42.0
[37] BiocGenerics_0.50.0
loaded via a namespace (and not attached): [1] rstudioapi_0.16.0 jsonlite_1.8.8 magrittr_2.0.3
[4] farver_2.1.1 rmarkdown_2.26 fs_1.6.4
[7] zlibbioc_1.50.0 vctrs_0.6.5 memoise_2.0.1
[10] RCurl_1.98-1.14 ggtree_3.12.0 rstatix_0.7.2
[13] htmltools_0.5.8.1 S4Arrays_1.4.0 broom_1.0.5
[16] SparseArray_1.4.3 gridGraphics_0.5-1 plyr_1.8.9
[19] httr2_1.0.1 cachem_1.0.8 igraph_2.0.3
[22] lifecycle_1.0.4 pkgconfig_2.0.3 Matrix_1.7-0
[25] R6_2.5.1 fastmap_1.1.1 gson_0.1.0
[28] GenomeInfoDbData_1.2.12 digest_0.6.31 aplot_0.2.2
[31] colorspace_2.1-0 patchwork_1.2.0 RSQLite_2.3.6
[34] labeling_0.4.3 timechange_0.3.0 fansi_1.0.4
[37] httr_1.4.7 polyclip_1.10-6 abind_1.4-5
[40] compiler_4.4.0 bit64_4.0.5 withr_3.0.0
[43] backports_1.4.1 BiocParallel_1.38.0 carData_3.0-5
[46] DBI_1.2.2 ggforce_0.4.2 ggsignif_0.6.4
[49] MASS_7.3-60.2 rappdirs_0.3.3 DelayedArray_0.30.1
[52] HDO.db_0.99.1 tools_4.4.0 ape_5.8
[55] scatterpie_0.2.2 glue_1.6.2 nlme_3.1-164
[58] GOSemSim_2.30.0 grid_4.4.0 shadowtext_0.1.3
[61] reshape2_1.4.4 fgsea_1.30.0 generics_0.1.3
[64] gtable_0.3.5 tzdb_0.4.0 hms_1.1.3
[67] data.table_1.15.4 car_3.1-2 tidygraph_1.3.1
[70] utf8_1.2.3 XVector_0.44.0 pillar_1.9.0
[73] yulab.utils_0.1.6 limma_3.60.0 splines_4.4.0
[76] tweenr_2.0.3 treeio_1.28.0 lattice_0.22-6
[79] bit_4.0.5 tidyselect_1.2.1 locfit_1.5-9.9
[82] Biostrings_2.72.0 knitr_1.46 gridExtra_2.3
[85] xfun_0.43 graphlayouts_1.1.1 statmod_1.5.0
[88] KEGGgraph_1.64.0 stringi_1.8.4 UCSC.utils_1.0.0
[91] lazyeval_0.2.2 ggfun_0.1.4 yaml_2.3.7
[94] evaluate_0.23 codetools_0.2-20 ggraph_2.2.1
[97] qvalue_2.36.0 Rgraphviz_2.48.0 ggplotify_0.1.2
[100] cli_3.6.2 munsell_0.5.1 Rcpp_1.0.12
[103] png_0.1-8 XML_3.99-0.16.1 parallel_4.4.0
[106] blob_1.2.4 DOSE_3.30.0 bitops_1.0-7
[109] tidytree_0.4.6 scales_1.3.0 crayon_1.5.2
[112] rlang_1.1.3 cowplot_1.1.3 fastmatch_1.1-4
[115] KEGGREST_1.44.0