geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
35 stars 10 forks source link

DB Object Type values for GPI 2.0 #2275

Closed vanaukenk closed 3 months ago

vanaukenk commented 5 years ago

For the GPI 2.0 specifications, we are proposing to use terms from the Molecular Sequence Ontology to identify the entity type represented in each line of the file.

We need to decide exactly what terms (and/or their children) can be used. Top level MSO terms that we should consider are:

[Term] id: MSO:0000704 name: gene def: "A region (or regions) that includes all of the elements necessary to encode a functional transcript. A gene may include regulatory regions, transcribed regions and/or other functional regions." [SO:immuno_workshop] comment: This term is mapped to MGED. Do not obsolete without consulting MGED ontology. A gene may be considered as a unit of inheritance. subset: SOFA synonym: "INSDC_feature:gene" EXACT [] xref: http://en.wikipedia.org/wiki/Gene is_a: MSO:3100210 ! nucleotide region property_value: IAO:0000118 "cistron" xsd:string

[Term] id: MSO:0000673 name: transcript def: "An RNA synthesized on a DNA or RNA template by an RNA polymerase." [SO:ma] subset: SOFA synonym: "INSDC_feature:misc_RNA" BROAD [] is_a: MSO:3000205 ! gene product is_a: MSO:3100174 ! nucleotide strand

[Term] id: MSO:0000104 name: peptide alt_id: SO:0000358 def: "An extent of amino acid residues that is not a proper part of a longer extent of amino acid residues. It may lack appreciable tertiary structure and may not be liable to irreversible denaturation." [SO:ma] comment: This term is mapped to MGED. Do not obsolete without consulting MGED ontology. The term 'protein' was merged with 'polypeptide'. Although 'protein' was a sequence_attribute and therefore meant to describe the quality rather than an actual feature, it was being used erroneously. It is replaced by 'peptidyl' as the polymer attribute. subset: SOFA xref: http://en.wikipedia.org/wiki/Polypeptide is_a: MSO:3100185 ! amino acid extent is_a: MSO:3100245 ! sequence molecular entity chain intersection_of: MSO:3100185 ! amino acid extent intersection_of: MSO:3100245 ! sequence molecular entity chain property_value: IAO:0000118 "protein" xsd:string {scope="EXACT"}

Would it be okay to allow use of any of these terms and/or one of their children? Do we need to be more specific about what values are allowed in the gpi file?

For macromolecular complexes, do we want to use the GO CC term GO:0032991 protein-containing complex?

What would we use for a nucleic-acid only complex?

ukemi commented 5 years ago

Will we ever be talking about a peptide that is not this?

[Term] id: MSO:3000263 name: translational product def: "A peptide or amino acid produced via ribosomal translation (as opposed to, e.g., synthetic peptides)." [] is_a: MSO:0000104 ! peptide is_a: MSO:3000205 ! gene product

What would we use for a nucleic-acid only complex? I hope the decision to simplify/change our original term doesn't come back to haunt us. Do we have any examples of nucleic acid complexes that don't contain proteins?

deustp01 commented 5 years ago

Glutathione; gramicidin and other bacterial peptide antibiotics.

ukemi commented 5 years ago

But would those be in a GPAD/GPI file? Are we sticking to gene products?

deustp01 commented 5 years ago

Got it. No, they should not be - they are the products of biosynthetic reactions that have free amino acids as inputs and conventional enzymes as enablers, not charged tRTNAs and mRNA inputs plus ribosomes as enablers.

srengel commented 5 years ago

why MSO instead of SO? SO is used by many MODs and other groups, including Alliance. MSO not so much. why purposefully select a different ontology? this is disturbing.

vanaukenk commented 5 years ago

In case anyone is interested, there is more information about the MSO, and its relationship to SO, here: https://github.com/The-Sequence-Ontology/MSO

The Guidelines for users sums the difference up nicely:

If you are annotating sequences in a database, abstracted from a molecular context and likely represented as a string of characters, use the SO.

If you are describing DNA, RNA, proteins etc. as molecules engaged in chemical events, use the MSO.

pgaudet commented 1 year ago

Out of date; not using MSO anymore.

kltm commented 1 year ago

Moved to a better home.

pgaudet commented 3 months ago

We dont plan to document ontologies we dont use.