geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
32 stars 10 forks source link

More Annotation Camp Topics from WormBase #16

Closed gocentral closed 8 years ago

gocentral commented 20 years ago

Here are some more annotation camp topics from WormBase:

1) Phenotype Penetrance and Process Terms

Most of our Biological Process annotations come from analysis of mutant phenotypes. For pleiotropic mutations, there are often many defects of varying penetrance. What is appropriate curation for defects that are weakly penetrant, but observed nonetheless?
For example, in a paper on dauer formation at high temperatures (PMID: 14504222), a number of singly and doubly mutant animals display a dauer phenotype with penetrance ranging from 0% to 34% (Table 7). Should all mutants with dauer formation above 0% be annotated to a dauer formation process term, even if the penetrance is quite low?

2) Process Curation for Gene Products like RNA Pol II

This next issue is somewhat related as it also concerns pleiotropies. For gene products whose primary function is well established, yet whose mutant phenotype is pleiotropic, how far away from the primary process is good GO curation? For example, AMA-1, the C. elegans large subunit of RNA polymerase II, is annotated to the process term, transcription from Pol II promoter. Loss of AMA-1 function in the early embryo results in embryonic lethality with defects in tissue differentiation, cell division cycles, and gastrulation movements (PMID: 8812143). We have annotated AMA-1 to the process terms related to these embryonic defects, but how many other transcription-dependent processes should we add? Worms require transcription for vulval development, post-embryonic cell migrations, male tail formation, etc., but at what point do you stop adding process terms for gene products with generalized functions like RNA polymerase II?

3) Using the NOT qualifier

What is appropriate use of the NOT qualifier? Is it intended to capture only those experimental results that are really unexpected, or is it intended to also capture negative results? Two examples: UNC-129 is a TGF-beta-like signalling molecule required for cell migration, but it does not appear to interact with the only known Type II TGF-beta receptor in worms, DAF-4 (PMID: 11018016). This suggests that UNC-129 functions in some way other than through the known TGF-beta signalling pathway in worms. So, would this be an appropriate use of the NOT qualifier:

NOT MF:transforming growth factor receptor binding IGI with DAF-4

RNAi of one of the three class I histone deacetylases in C. elegans results in embryonic lethality, while RNAi of the other two has no discernible effect (PMID: 9875852). HDA-1 has been annotated to embryonic development, but should the other two be annotated to NOT embryonic development?

4) How should we deal with "data not shown"?

In general, what is the appropriate evidence code to use when an experimental result is described in a paper as "data not shown". If it is clear from the paper that the authors used the same assay as one they describe in the paper, but for which they didn't actually present the data for the gene product you are annotating, is IDA okay? If the statement is made in the discussion section of a paper and you're not absolutely certain how the information was obtained would you then use NAS?

5) Expression pattterns and IDA vs. IEP

To be certain, is cellular component information generated from antibodies or reporters always annotated using the IDA evidence code? Have other databases used the IEP evidence code and if so, for what types of experiments? We have been thinking of using microarray data to annotate to a few select processes (aging, dauer formation, spermatogenesis), but have not yet done so. Is this the intended use of IEP?

6) IEA Annotation Maintenance

How much effort do other databases put into keeping their IEA annotations up-to-date? WormBase is released fortnightly and so theoretically (since gene models do change with every release), we could update our IEA annotations every two weeks. How often do other databases update their IEAs? To what extent are IEA GO annotations used by people who are doing informatics or other types of analyses, and therefore, what is the appropriate level of upkeep for IEAs?

7) Species- and Taxon-specific terms

One issue we've been grappling with at WormBase is how to provide annotations of deep granularity without having to introduce an abundance of species- or nematode-specific terms. This point becomes especially relevant when we think about using the very well-defined anatomy of C. elegans. For example, vulval development in C. elegans involves specification of what is known as primary and secondary vulval cell lineages. Different gene products are involved in each of these separate specification processes, both of which could be children of the existing GO term, vulval development. But what is the cut-off for introducing species- or taxon-specific anatomy terms? While a term like "specification of primary vulval cell lineages" does not strike me as too species-specific, a term like "specification of the embryonic blastomere EMS" does. Can GO provide any more guidance to annotators about species-specific terms, especially with respect to anatomy? I know that cross-product ontologies have been suggested as a solution to the potentially infinite expansion of the gene ontology if too many species-specific anatomy terms are introduced. Are any of the MODs currently using a cross-product ontology? WormBase has an existing anatomy ontology, so would it be appropriate for us to start developing and implementing a cross-product ontology for our users? How would this ontology fit in with the rest of GO?

8) Appropriate use of sensu terms

Related to this, I need to annotate a C. elegans gene product to the process term oogenesis. Two oogenesis terms currently exist in GO: oogenesis (sensu Insecta) and oogenesis (sensu Mammalia). Since I don't have nematode-specific child terms for oogenesis yet, should I propose the general term, oogenesis, the more specific term oogenesis (sensu Nematoda), or both?

9) Annotate to gene, protein, or transcript?

We would also like more guidance on the appropriate selection for Column 12 of the gene association file, DB_Object_Type. Some gene association files have "gene" in this column, even when the annotation is clearly to a protein (i.e., GO: CC: some intracellular complex, Evidence code: IPI). We decided to place protein or transcript in Column 12, but are now wondering if this is correct. If a gene product is determined to have kinase activity based upon sequence similarity, not upon a direct biochemical assay, what is the correct object type for column 12? Gene or protein?

Reported by: vanaukenk

Original Ticket: "geneontology/annotation-issues/16":https://sourceforge.net/p/geneontology/annotation-issues/16

gocentral commented 20 years ago

Logged In: YES user_id=436423

Answer to question 9: Column 12 (DB_Object_Type) refers to what is represented database entry whose ID appears in Column 2. That is, does the ID refer to an entry for a gene? If so, put 'gene' in column 12.

I've added this bit to the annotation documentation at http:// www.geneontology.org/GO.annotation.html.

Midori

Original comment by: mah11

gocentral commented 20 years ago

Original comment by: mah11

gocentral commented 19 years ago

Logged In: YES user_id=436423

Is there anything here that wasn't covered at the camp, or captured in the camp minutes?

Original comment by: mah11

gocentral commented 19 years ago

Logged In: YES user_id=883960

No, I think we've covered everything.

--Kimberly

Original comment by: vanaukenk

gocentral commented 19 years ago

Logged In: YES user_id=436423

OK, thanks; I'll close this item.

Original comment by: mah11

gocentral commented 19 years ago

Original comment by: mah11