geneontology / helpdesk

The Gene Ontology Helpdesk
http://help.geneontology.org
16 stars 6 forks source link

GO parent and child terms #172

Closed fcamus closed 6 years ago

fcamus commented 6 years ago

Hi all,

I am performing an analysis where I download all the genes in biological process (GO:0008150) and "metabolic function" (GO:0008152) for Drosophila melanogaster. I have noticed that there are genes that I find in "metabolic function" that are NOT in "biological process". This makes me a bit worried...

Is it right for me to think that genes within the "metabolic function" category should also be in "biological process"? Or do I have my GO terms wrong?

thanks in advance.

suzialeksander commented 6 years ago

Hi @fcamus,

Thank you for writing in to GO Helpdesk.

I'm pretty sure you mean 'GO:0008152 metabolic process'. Since 'GO:0008152 metabolic process' is_a (is a child term of) 'GO:0008150 biological process', then yes, all gene products that are annotated to 'metabolic process' should be in a list of all gene products that are annotated to Process terms.

However, if your list is only of gene products annotated directly to 'GO:0008150 biological process', there should be no overlap; annotations are only directly made to 'GO:0008150 biological process' when there is no literature available for a more informative GO annotation in the Process branch. Annotations to this term has a special meaning.

If you have examples of gene products that are annotated to 'GO:0008152 metabolic process' but are not in a list of all Process annotations for fly, please let us know the specific gene(s) in question and we can take a closer look at this.

It may also help clarify if we know how you're making these lists- if you're using Amigo, please be aware of the difference between

fcamus commented 6 years ago

Thanks for the help. Just before I start sending you lists of genes that dont overlap I just want to double check that I am retrieving genes from biomart properly. Here I am creating a list of all genes that have "biological process" as a GO term. Depending on the GO that I want to download I just change the GO term on the script. Does this look about correct?

mart <- useMart("ensembl")
mart <- useDataset("dmelanogaster_gene_ensembl",mart)

biolPR <- getBM(attributes=c('flybase_gene_id',"external_gene_name", "entrezgene"),
                   filters = 'go', values = "GO:0008150", mart = mart)

thanks again!

suzialeksander commented 6 years ago

@fcamus, I'm afraid you'll have to ask biomaRt about searches using their tool; biomaRt is developed/maintained independently from GO and just uses our data.

If you want to compare your biomaRt results to ones using the AmiGO tool, here's some links to get you started:

http://amigo.geneontology.org/amigo/search/annotation?q=*:*&fq=taxon_subset_closure_label:%22Drosophila%20melanogaster%22&fq=regulates_closure_label:%22metabolic%20process%22&sfq=document_category:%22annotation%22

http://amigo.geneontology.org/amigo/search/annotation?q=*:*&fq=taxon_subset_closure_label:%22Drosophila%20melanogaster%22&fq=regulates_closure_label:%22biological_process%22&sfq=document_category:%22annotation%22

http://amigo.geneontology.org/amigo/search/annotation?q=*:*&fq=taxon_subset_closure_label:%22Drosophila%20melanogaster%22&fq=annotation_class_label:%22biological_process%22&sfq=document_category:%22annotation%22

You can download any of these lists with the blue buttons near the top right of the page.

hattrill commented 6 years ago

Hi @fcamus the file that FlyBase deposits with the GO should have no overlap between the direct annotation GO:0008150 biological_process and other annotations in the biological process branch as we run a script to remove any clashes in our database.

GO_Central now release their own set of annotations based on phylogenetic analsysis, which is added to the Dmel set provided by us. I am not sure if GO runs a separate pipeline to remove root terms after this addition, so there may well be some overlap.

If you have any questions about Fly annotations, please ask. "GO:0008152 metabolic process" is a very high-level term and includes all sorts of cellular metabolic processes, from DNA and protein synthesis to small molecule processes.

hattrill commented 6 years ago

@fcamus, I should clarify, that we do not automatically remove root terms where there is an electronic annotation from the InterPro2GO pipeline, as these are non-reviewed, computationally derived annotations that are 'refreshed' every release. A direct annotation to a root term e.g. GO:0008150 biological_process, indicates that someone has looked and found no data to support an annotation.

fcamus commented 6 years ago

Thanks @hattrill ! I figured out the problem was with bioMart and not with gene ontogeny I downloaded genes for the 2 GO categories from the ensembl website and had no problems thereafter. :)

hattrill commented 6 years ago

That's good news.