Open m-bogaerts opened 2 weeks ago
One way of achieving this would be by a 'simple' query of theOrgDb
:
> ## load library
> library(org.Dm.eg.db)
>
> ## extract the 'keys' (= geneid) that can be queried for
> k <- keys(org.Dm.eg.db)
>
> ## check
> k[1:5]
[1] "30970" "30971" "30972" "30973" "30975"
>
>
>
> ## query for the 1st 50 ids.
> res <- select(org.Dm.eg.db,
+ keys=k[1:50],
+ columns = c("GOALL"),
+ keytype="ENTREZID")
'select()' returned 1:many mapping between keys and columns
>
> ## of these 50, which geneids do NOT have a GO annotation?
> ## answer: 5 genes
> unique( res[ is.na(res$GOALL), ]$ENTREZID )
[1] "30972" "30979" "30991" "31005" "31026"
>
> length( unique(res[ is.na(res$GOALL), ]$ENTREZID) )
[1] 5
>
> ## of these 50, which geneids do HAVE a GO annotation?
> ## answer: 45 genes
> unique( res[ !is.na(res$GOALL), ]$ENTREZID )
[1] "30970" "30971" "30973" "30975" "30976" "30977" "30978" "30980" "30981"
[10] "30982" "30983" "30984" "30985" "30986" "30988" "30990" "30994" "30995"
[19] "30996" "30998" "31000" "31001" "31002" "31003" "31004" "31006" "31007"
[28] "31009" "31010" "31011" "31012" "31013" "31014" "31015" "31016" "31017"
[37] "31018" "31019" "31020" "31021" "31022" "31023" "31024" "31025" "31027"
>
> length( unique( res[ !is.na(res$GOALL), ]$ENTREZID ) )
[1] 45
>
Note that you may need to adapt the argument keytype
when using FlyBase ids.
Hello,
I am using the function compareCluster for three different lists of genes (Drosophila melanogaster; flybase Fbgn). When I have the results I observe that not all the genes are used for the enrichment (i.e. a set of 182 genes goes to 142 genes) according to the ratio that is observed in the results, which I understand is because there are 40 without an associated GO term. Is there anyway to obtain the identity of the 142 genes that do have an associated GO term?
Thank you very much in advance.