RockefellerUniversity / RU_RNAseq

MIT License
5 stars 4 forks source link

GO enrichment (retrieve category names of GO IDs) #7

Open zapaterc opened 3 years ago

zapaterc commented 3 years ago

Hi Tom and Matt,

I'm trying to create a list in order to retrieve the category names of the GO IDs. After running the goseq() function I'm able to get the enriched GO BP categories.

I'm trying to follow exercise 1 of RNAseq session 3, where instead of doing GO enrichment we do KEGG enrichment. There, we create a list where we have the path ID and the actual names for the pathways using: xx <- as.list(KEGGPATHID2NAME)

I wonder if there's an equivalent annotation data object (equivalent to KEGGPATHID2NAME) in the GO.db library that maps GO:BP IDs to GO:BP category names.

matthew-paul-2006 commented 3 years ago

You have that right exactly. When we covered Org.eg.db objects we talked about the select() function from the AnnotationDBI package. This allows you to grab information from the database. You need to give it the keys, keytypes and columns:
keys = what specific thing you want to look up i.e. several GOIDs keytype = what is the key? i.e. GOID columns = what do you want back? i.e. TERM (which is the name).

You can actually use keytype(GO.db) and columns(GO.db) on the database to see what categories are inside, before you run select and decide what you want to get back. Then you can run select:

select(GO.db, keys = "GO:0001851", keytype = "GOID", columns = "TERM")

This looks inside GO.db and find the term associated with the GOID "GO:0001851". The result looks like this: Screen Shot 2021-08-06 at 11 40 20 AM

Now this does not answer your question as you wanted to know all of the GO terms, not just one. You can provide multiple keys to select. So what we can do is provide every key i.e. every GOID. To get every GOID we can use the keys() function on GO.db. Then use the output to lookup every TERM.

input <- keys(GO.db, keytype = "GOID")
output <- select(GO.db, keys = input, keytype = "GOID", columns = "TERM")

This is what the output looks like: Screen Shot 2021-08-06 at 11 45 10 AM

matthew-paul-2006 commented 3 years ago

Another wrinkle is a lot of this information is actually contained in Org.eg.db. So it is worth checking to see where the package you are using is drawing the GO information from. As there might not be a perfect one to one between GO.db and the specific Org.eg.db.