YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
965 stars 246 forks source link

Retrieve the set of genes with assigned GO #698

Open m-bogaerts opened 2 weeks ago

m-bogaerts commented 2 weeks ago

Hello,

I am using the function compareCluster for three different lists of genes (Drosophila melanogaster; flybase Fbgn). When I have the results I observe that not all the genes are used for the enrichment (i.e. a set of 182 genes goes to 142 genes) according to the ratio that is observed in the results, which I understand is because there are 40 without an associated GO term. Is there anyway to obtain the identity of the 142 genes that do have an associated GO term?

Thank you very much in advance.

guidohooiveld commented 2 weeks ago

One way of achieving this would be by a 'simple' query of theOrgDb:

> ## load library
> library(org.Dm.eg.db)
> 
> ## extract the 'keys' (= geneid) that can be queried for
> k <- keys(org.Dm.eg.db)
> 
> ## check
> k[1:5]
[1] "30970" "30971" "30972" "30973" "30975"
> 
> 
> 
> ## query for the 1st 50 ids.
> res <- select(org.Dm.eg.db,
+               keys=k[1:50],
+               columns = c("GOALL"),
+               keytype="ENTREZID")
'select()' returned 1:many mapping between keys and columns
> 
> ## of these 50, which geneids do NOT have a GO annotation?
> ## answer: 5 genes
> unique( res[ is.na(res$GOALL), ]$ENTREZID )
[1] "30972" "30979" "30991" "31005" "31026"
> 
> length( unique(res[ is.na(res$GOALL), ]$ENTREZID) )
[1] 5
> 
> ## of these 50, which geneids do HAVE a GO annotation?
> ## answer: 45 genes
> unique( res[ !is.na(res$GOALL), ]$ENTREZID )
 [1] "30970" "30971" "30973" "30975" "30976" "30977" "30978" "30980" "30981"
[10] "30982" "30983" "30984" "30985" "30986" "30988" "30990" "30994" "30995"
[19] "30996" "30998" "31000" "31001" "31002" "31003" "31004" "31006" "31007"
[28] "31009" "31010" "31011" "31012" "31013" "31014" "31015" "31016" "31017"
[37] "31018" "31019" "31020" "31021" "31022" "31023" "31024" "31025" "31027"
> 
> length( unique( res[ !is.na(res$GOALL), ]$ENTREZID ) )
[1] 45
>

Note that you may need to adapt the argument keytype when using FlyBase ids.