geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

Investigate equivalence between GO:0004818 and GO:0050561_Rhea generics compounds #15930

Open ukemi opened 6 years ago

ukemi commented 6 years ago

@goodb has created logical defs for GO terms based on Rhea xrefs and reactants. The reasoner is inferring these two terms are equivalent. We need to investigate this.

deustp01 commented 6 years ago

The very first step is to ask @hdrabkin about the biochemistry here - is there a true difference between the two functions or has the reasoner identified two terms that should be merged?

goodb commented 6 years ago

I may be able to shed a little light on how this came to be. The reasoning here is based on exact matching of the chemical reactions associated with the terms. These two terms refer to two different RHEA reactions: 23543 and 18400. The reason there are two rhea reactions is that the reactions differ in some of the 'generic' participants. The reason the reasoner thinks they are the same is that the different generics map to the same CHEBI ids. e.g., (tRNAGlu GENERIC:9663) => CHEBI:78442 in reaction 23543 and (tRNAGlx GENERIC:9713) => CHEBI:78442 rhea 18400. The question for GO is whether the reaction participant representations should be limited to CHEBI terms as they are now or allowed to branch off into different, potentially more specific, identifier spaces like these terms from Rhea.

From their doc:

A Rhea generic is a molecule that is not represented as is in ChEBI, it has an accession on the form GENERIC:xxx (with xxx a numeric identifier). Rhea generics are proteins, nucleic acids, polysaccharides or some small molecules acting as reactants or products in a Rhea reaction. Such molecules are modeled with the residues and/or functional groups that are directly involved in the chemical transformations. The residues and/or functional groups are linked to ChEBI compounds. Examples: GENERIC:9685 (holo-[ACP]) or GENERIC:9621 (acetyl-[ACP]) in RHEA:41788 It is possible to find several generic compounds with the same functional group (e.g. GENERIC:9669 tRNA(Ser) and GENERIC:9657 tRNA(Ala)).

hdrabkin commented 6 years ago

GO:0004818 glutamate tRNA ligase EC 6.1.1.7 ATP + L-glutamate + tRNA(Glu) = AMP + diphosphate + L-glutamyl-tRNA(Glu).

GO:0050561 glutamate-tRNA(Gln) ligase activity EC 6.1.1.24 Catalysis of the reaction: tRNA(Glx) + L-glutamate + ATP = glutamyl-tRNA(Glx) + diphosphate + AMP

Most organisms one synthetase (ligase) to make gln-tRNA (gln) (using gln) and one synthetase to make glu-tRNA(glu) (using glu)

However In other organisms (and maybe organelles) , there is a ligase that adds glu to either tRNA(glu) or tRNA(gln) (hence the tRNA(glx) . A second enzyme activity then takes glu-tRNA(gln) and aminates the glu to make gln-tRNA(gln).

The term name of the 6.1.1.24 should be changed to glutamate-tRNA(Glx) ligase activity

goodb commented 6 years ago

FYI. Out of 11,115 reactions (from a recent RDF export of RHEA), 252 use a polymer, 1,722 use a generic, with 1958 using one or the other. (So 18% of RHEA reactions are not totally covered by CHEBI at the moment.)

alanbridge commented 5 years ago

The question for GO is whether the reaction participant representations should be limited to CHEBI terms as they are now or allowed to branch off into different, potentially more specific, identifier spaces like these terms from Rhea.

We have been having similar discussions internally about this. We originally introduced GENERICs in Rhea to describe reactions for those IUBMB enzyme classes that involve specific protein/nucleic acid substrates. We don't want to create new identifiers in Rhea for every protein or RNA though as this would become unmanageable. Furthermore now that we will use Rhea in UniProt, that will give us the possibility to record the specific target there, while keeping Rhea more lightweight - focusing on the actual transformation.

We are therefore currently recommending that UniProt curators focus on the chemistry, so they would annotate something like this:

Rhea: protein + ATP = protein-P + ADP

rather than

Rhea: Wnt + ATP = Wnt-P + ADP Rhea: histone + ATP = histone-P + ADP etc

It's currently a mixed bag though as we have a lot of legacy data.

As I understand it, the current thinking in GO is to remove as much information about the target from GO terms as possible. So if you based your reasoning on the ChEBI groups you would in this specific case would infer that GO:0004818 and GO:0050561 are the same - which chemically, they are, as they both convert a AMP 3'-end residue to a 3'-(L-glutamate)adenylyl group using ATP and glutamate although the substrate differs. In that case you might create a single GO term and merge these two as the molecular activity is really the same (I think).

As an aside, the implementation for UniProt shows the GENERICs and their groups like in Rhea. Ideally the search should then work at 2 levels, so people could search for

or

etc.

I also forgot to mention, we are currently mapping PTM annotations (for which we have a CV at https://www.uniprot.org/docs/ptmlist) to ChEBI groups. Rhea reactions that modify proteins will therefore feature ChEBI groups that are also linked to proteins via feature annotations in UniProt.

I will have a look at the NTR peptidyl-lysine 3-dioxygenase activity ticket.

deustp01 commented 5 years ago

The discussion at NTR peptidyl-lysine 3-dioxygenase activity #16632 could provide a use case for this, along with the newly created Reactome human reactions visible in draft form here.

ukemi commented 5 years ago

Would love to see us get moving with this project again. @goodb, do you think we could use a similar strategy here that you developed for the Reactome imports? https://github.com/geneontology/pathways2GO/issues/70 Would it make sense to create a mini REO ontology that would fit into ChEBI the way that REO will fit into UniProtKB?

goodb commented 5 years ago

@ukemi Reading back through this issue (its been quite a while..) it seems like this is more of a question of whether or not and how to modify the GO MF branch, no? While I think the REO approach is necessary for the Reactome conversion project I would definitely try to handle things in the main ontologies rather than making new ones if at all possible.

Has there been any progress on #14984 ? It seems like that would be the first one to solve. Maybe something to revisit in person in Berkeley?

ukemi commented 5 years ago

I think we should revisit it in Berkeley. This issue was one of the things hanging up #14984. We were getting equivalent classes for different reactions (truly different molecular functions) when one of the participants in the reactions (MFs) is a ChEBI class that isn't specific enough. If we want to keep this level of granularity (maybe unsustainable) then we need to better specify the participants. Alternatively we could take @alanbridge 's approach above and merge functions that carry out essentially the same reaction.

Again, probably something better discussed face-2-face so we can look at it together. Let's make a plan. This follows nicely and melds well with the Reactome work.

pgaudet commented 3 years ago

Should we merge those two terms ?

hdrabkin commented 3 years ago

NO; the substrate tRNA is different. Annotating the genes responsible for the reactions to a single GO term would be incorrectl


GO:0004818 glutamate tRNA ligase EC 6.1.1.7 ATP + L-glutamate + tRNA(Glu) = AMP + diphosphate + L-glutamyl-tRNA(Glu).*

GO:0050561 glutamate-tRNA(Gln) ligase activity EC 6.1.1.24 Catalysis of the reaction: tRNA(Glx) + L-glutamate + ATP = glutamyl-tRNA(Glx) + diphosphate + AMPnt.

pgaudet commented 3 years ago

@amorgat I dont undertand RHEA:18400 uses Glx, which means Glu or Gln. tRNA Glx does not have a ChEBI link in the Rhea page.

This was not acceptable for protein serine/threonine kinase activity, see #20114

Looks like Rhea can handle 'or' statements ?

Thanks, Pascale

amorgat commented 3 years ago

Hi @pgaudet!
It's a slightly different case: RHEA:18397 (the master reaction) is linked to EC 6.1.1.24 and describes a glutamylation reaction. As you mentioned, it can acts on both tRNA Glu and Gln, but it's always the same chemical transformation, i.e an AMP 3'-end residue (CHEBI:78442) is transformed to a 3'-(L-glutamate)adenylyl residue. In Rhea, we handle macromolecules with RHEA-COMP, an identifier (e.g RHEA-COMP:9713), a label, and a ReactivePart which represents the functional groups involve in the reaction. The Reactive part is composed of one (or several) ChEBI residue(s). Remember that ChEBI is now focusing on small molecules (even they still have old thing to clean), so no way to link the tRNA to a ChEBI entity. So yes, in RHEA:18397, we handle an 'or' statement, but just for the label of the macromolecule.

For protein serine/threonine kinase activity, the ReactivePart can be a Ser (Ser = Ser-P) or a Thr (Thr = ThrP), that's why we have 2 reactions.

pgaudet commented 3 years ago

@goodb

Can you list the logical definitions you made for this branch ?

Thanks, Pascale

hdrabkin commented 3 years ago

But in this case I don't think the term should be merged (there are other cases in GO where the reaction from RHEA would be the same for several go terms. In this case we don't make a RHEA link. It is important to distinguish these by the tRNA substrate because if merged, it implies all reacting with tRNAglu would also react with tRNAgln but that is not true.

hdrabkin commented 3 years ago

Thus in this case we might want to remove the Rhea for these two terms.

pgaudet commented 3 years ago

We could, but right now there are no LD in that branch, so there is no problem (yet?)

goodb commented 3 years ago

@goodb

Can you list the logical definitions you made for this branch ?

Thanks, Pascale

Its been too long. Maybe @balhoff has something to share? My understanding was that he was taking this forward.

balhoff commented 3 years ago

Here's how the logical definition looks for RHEA:18397 in my current transformation (I'm only showing one direction):

catalytic activity and ((has output some (L-glutamate(1−) and (has_stoichiometry value "1"^^xsd:string))) and (has output some (ATP(4−) and (has_stoichiometry value "1"^^xsd:string))) and (has output some (AMP 3'-end(1−) residue and (part of some tRNA(Glx)) and (has_stoichiometry value "1"^^xsd:string)))) and (has input some (diphosphate(3−) and (has_stoichiometry value "1"^^xsd:string))) and (has input some (adenosine 5'-monophosphate(2−) and (has_stoichiometry value "1"^^xsd:string))) and (has input some (3'-(L-glutamate)adenylyl(1−) group and (part of some L-glutamyl-tRNA(Glx))and (has_stoichiometry value "1"^^xsd:string)))

pgaudet commented 3 years ago

Thanks @balhoff ! (and @goodb for pointing me in the right direction)

Not sure what is the solution for the ontology. We dont need to remove the xrefs until we try to load the LD, and at this point maybe we sound consider using the RHEA:Generic compounds ?