Closed gocentral closed 8 years ago
Logged In: YES user_id=436423
Answer to question 9: Column 12 (DB_Object_Type) refers to what is represented database entry whose ID appears in Column 2. That is, does the ID refer to an entry for a gene? If so, put 'gene' in column 12.
I've added this bit to the annotation documentation at http:// www.geneontology.org/GO.annotation.html.
Midori
Original comment by: mah11
Original comment by: mah11
Logged In: YES user_id=436423
Is there anything here that wasn't covered at the camp, or captured in the camp minutes?
Original comment by: mah11
Logged In: YES user_id=883960
No, I think we've covered everything.
--Kimberly
Original comment by: vanaukenk
Logged In: YES user_id=436423
OK, thanks; I'll close this item.
Original comment by: mah11
Original comment by: mah11
Here are some more annotation camp topics from WormBase:
1) Phenotype Penetrance and Process Terms
Most of our Biological Process annotations come from analysis of mutant phenotypes. For pleiotropic mutations, there are often many defects of varying penetrance. What is appropriate curation for defects that are weakly penetrant, but observed nonetheless?
For example, in a paper on dauer formation at high temperatures (PMID: 14504222), a number of singly and doubly mutant animals display a dauer phenotype with penetrance ranging from 0% to 34% (Table 7). Should all mutants with dauer formation above 0% be annotated to a dauer formation process term, even if the penetrance is quite low?
2) Process Curation for Gene Products like RNA Pol II
This next issue is somewhat related as it also concerns pleiotropies. For gene products whose primary function is well established, yet whose mutant phenotype is pleiotropic, how far away from the primary process is good GO curation? For example, AMA-1, the C. elegans large subunit of RNA polymerase II, is annotated to the process term, transcription from Pol II promoter. Loss of AMA-1 function in the early embryo results in embryonic lethality with defects in tissue differentiation, cell division cycles, and gastrulation movements (PMID: 8812143). We have annotated AMA-1 to the process terms related to these embryonic defects, but how many other transcription-dependent processes should we add? Worms require transcription for vulval development, post-embryonic cell migrations, male tail formation, etc., but at what point do you stop adding process terms for gene products with generalized functions like RNA polymerase II?
3) Using the NOT qualifier
What is appropriate use of the NOT qualifier? Is it intended to capture only those experimental results that are really unexpected, or is it intended to also capture negative results? Two examples: UNC-129 is a TGF-beta-like signalling molecule required for cell migration, but it does not appear to interact with the only known Type II TGF-beta receptor in worms, DAF-4 (PMID: 11018016). This suggests that UNC-129 functions in some way other than through the known TGF-beta signalling pathway in worms. So, would this be an appropriate use of the NOT qualifier:
NOT MF:transforming growth factor receptor binding IGI with DAF-4
RNAi of one of the three class I histone deacetylases in C. elegans results in embryonic lethality, while RNAi of the other two has no discernible effect (PMID: 9875852). HDA-1 has been annotated to embryonic development, but should the other two be annotated to NOT embryonic development?
4) How should we deal with "data not shown"?
In general, what is the appropriate evidence code to use when an experimental result is described in a paper as "data not shown". If it is clear from the paper that the authors used the same assay as one they describe in the paper, but for which they didn't actually present the data for the gene product you are annotating, is IDA okay? If the statement is made in the discussion section of a paper and you're not absolutely certain how the information was obtained would you then use NAS?
5) Expression pattterns and IDA vs. IEP
To be certain, is cellular component information generated from antibodies or reporters always annotated using the IDA evidence code? Have other databases used the IEP evidence code and if so, for what types of experiments? We have been thinking of using microarray data to annotate to a few select processes (aging, dauer formation, spermatogenesis), but have not yet done so. Is this the intended use of IEP?
6) IEA Annotation Maintenance
How much effort do other databases put into keeping their IEA annotations up-to-date? WormBase is released fortnightly and so theoretically (since gene models do change with every release), we could update our IEA annotations every two weeks. How often do other databases update their IEAs? To what extent are IEA GO annotations used by people who are doing informatics or other types of analyses, and therefore, what is the appropriate level of upkeep for IEAs?
7) Species- and Taxon-specific terms
One issue we've been grappling with at WormBase is how to provide annotations of deep granularity without having to introduce an abundance of species- or nematode-specific terms. This point becomes especially relevant when we think about using the very well-defined anatomy of C. elegans. For example, vulval development in C. elegans involves specification of what is known as primary and secondary vulval cell lineages. Different gene products are involved in each of these separate specification processes, both of which could be children of the existing GO term, vulval development. But what is the cut-off for introducing species- or taxon-specific anatomy terms? While a term like "specification of primary vulval cell lineages" does not strike me as too species-specific, a term like "specification of the embryonic blastomere EMS" does. Can GO provide any more guidance to annotators about species-specific terms, especially with respect to anatomy? I know that cross-product ontologies have been suggested as a solution to the potentially infinite expansion of the gene ontology if too many species-specific anatomy terms are introduced. Are any of the MODs currently using a cross-product ontology? WormBase has an existing anatomy ontology, so would it be appropriate for us to start developing and implementing a cross-product ontology for our users? How would this ontology fit in with the rest of GO?
8) Appropriate use of sensu terms
Related to this, I need to annotate a C. elegans gene product to the process term oogenesis. Two oogenesis terms currently exist in GO: oogenesis (sensu Insecta) and oogenesis (sensu Mammalia). Since I don't have nematode-specific child terms for oogenesis yet, should I propose the general term, oogenesis, the more specific term oogenesis (sensu Nematoda), or both?
9) Annotate to gene, protein, or transcript?
We would also like more guidance on the appropriate selection for Column 12 of the gene association file, DB_Object_Type. Some gene association files have "gene" in this column, even when the annotation is clearly to a protein (i.e., GO: CC: some intracellular complex, Evidence code: IPI). We decided to place protein or transcript in Column 12, but are now wondering if this is correct. If a gene product is determined to have kinase activity based upon sequence similarity, not upon a direct biochemical assay, what is the correct object type for column 12? Gene or protein?
Reported by: vanaukenk
Original Ticket: "geneontology/annotation-issues/16":https://sourceforge.net/p/geneontology/annotation-issues/16