geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
222 stars 40 forks source link

Using Rhea to populate logical definitions of reactions #14984

Closed pgaudet closed 1 year ago

pgaudet commented 6 years ago

Hello,

Discussing with @amorgat about how Rhea and GO represent biochemical reactions, here are a few points to consider:

I am putting this here for discussion - @ukemi I let you decide how soon we need to discuss this.

Thanks, Pascale

ukemi commented 6 years ago

If you don't mind, I'll break them out and create a project, probably not tonight.

deustp01 commented 6 years ago

@alanbridge As above, Ben, David, and I were discussing the RHEA - GO - Reactome reaction-mapping project earlier today and got to the issue of reaction direction. We think we see how to handle that for the purpose of making the mappings we want, but that led us to wonder how RHEA determines which entities get listed on the left and which ones on the right in the master form of a RHEA reaction. Looking in the IUBMB EC materials turned up some partial rules that EC uses - within an EC category (oxidoreductase, transferase, etc.) entities that serve as substrates as the enzyme is named should be on the left regardless of the physiological direction of the reaction. But it seems like there should be more. Can you add anything from the RHEA perspective?

alanbridge commented 6 years ago

Hi Peter, Ben, Jim and all,

I will try to give the Rhea perspective in a nutshell: Anne Morgat and Elisabeth Coudert are leading the Rhea curation and mapping to UniProt enzyme comments at the SIB and might chime in too.

Hope that helps,

All the best, Alan

amorgat commented 6 years ago

@deustp01. To complete Alan's answer. From https://www.enzyme-database.org/faq.php

  1. Why is a reaction written in one direction even if the opposite direction is the one observed in vivo? All reactions within a given sub-subclass are written in the same direction irrespective of the direction in which the reaction normally occurs in vitro/vivo. Many factors can influence the direction in which a reaction occurs so, by using the same direction for all reactions within a sub-subclass, we are not making any assumptions about the equilibrium of the reaction or its direction in vivo.
deustp01 commented 6 years ago

Thanks @alanbridge @amorgat . What prompted the questions to you is that Ben, David, and I in our own discussion could not state the rules RHEA uses to set the let and right sides of a master reaction. We were also hoping that, hidden from us, there was a general algorithm that was used to do this because that might allow some consistency checking at our end. From Alan's response and Anne's addition, it is clear what the rules are and also that while left and right are authoritatively defined for every reaction and the definitions are systematic and consistent wherever possible there is not a general algorithm.

That works for us - the RHEA master reaction can map to the GO molecular function term, and when someone wants to annotate a reaction and deal with physiological direction, the RHEA directional children can be used.

goodb commented 6 years ago

Okay, it seems then that we are still left with the conundrum for the logical definitions that we can not definitively and consistently predict the leftness or rightness of the two substance bags in any given reaction. So we need to come up with an OWL representation that does not depend on direction yet gives the right inferences.

deustp01 commented 6 years ago

In words, we can only as far as saying that a molecular function enables the interconversion of A and B. We can't give a direction. But that probably is right: a priori, not enough is known to specify reaction directions systematically. There are both the famous mistakes where the people who discovered the function got its physiological direction backwards (for a recent less famous example remember the discussion here about 6 months ago of a reaction of glutathione metabolism), and also a lot of cases where a single enzyme reliably moves a reaction from left to right under some normal physiological conditions and from right to left under others, e.g., all the reversible steps of glycolysis.

So maybe the question now is whether "interconversion" can be used. Do I remember right that logical OR's are poisonous? If not, that would do it - conversion of A to B OR of B to A.

goodb commented 6 years ago

Yes, the logical OR would solve this, but as mentioned above, kills the ELK reasoner favored by the GO. I wonder its worth reaching out to the ELK developers to see if they could help. Sometimes a use case can be helpful motivation...

goodb commented 6 years ago

As a test, I implemented the pattern @ykazakov suggested in https://github.com/liveontologies/elk-reasoner/issues/54#issuecomment-398921969 . I added the new axioms into a merged GOPlus and also merged in a subset of CHEBI that contains all the referenced CHEBI terms (created using Robot). The result is attached here for inspection and feedback. See also spreadsheet with new direct subclass relations inferred as a result of the new axioms. In total I am currently counting 126 new direct subclass relations inferred as a result of adding the new axioms, 35 of which require CHEBI axioms (see second tab in spreadsheet for these). (This goes up to 406 / 446 respectively if including indirect subclass relationships.)

Looking through these, I see a number of cases like "zinc ion transmembrane transporter activity" subclass of "catalytic activity". I'm guessing that is not desired, but await feedback from the actual biochemists here. We can probably detect and cure this on the way in based on the 'Transport' qualifier in RHEA. Its happening because the algorithm assumes everything coming from RHEA is a subclass of catalytic activity...

GO_Ultra_Intersection_With_GOPlus_with_Chebi_extract.ttl.zip

kaxelsen commented 6 years ago

"zinc ion transmembrane transporter activity" can very well be a subclass of "catalytic activity". It depends whether the transporter in charge is a primary transporter (e.g. a P-type ATPase fueled by ATP) or a secondary transporter (e.g. a permease or a channel fueled by differences in charges or concentrations of substrate over the membrane). In the first case the transporter is regarded as an enzyme (and is included in the EC list) and its activity thus a catalytic activity, in the second case not.

cmungall commented 6 years ago

On 27 Jun 2018, at 23:57, goodb wrote:

Looking through these, I see a number of cases like "zinc ion transmembrane transporter activity" subclass of "catalytic activity". I'm guessing that is not desired, but await feedback from the actual biochemists here. We can probably detect and cure this on the way in based on the 'Transport' qualifier in RHEA. Its happening because the algorithm assumes everything coming from RHEA is a subclass of catalytic activity...

I don't think this is desired.

How are the (In) and (Out) in https://www.rhea-db.org/reaction?id=29354 translated to OWL?

The simplest thing is simply to mask all OWL axioms generated for anything that is not already under catalytic activity. We have separate DPs for transporters

ukemi commented 6 years ago

This masking is probably a good first step, but when I was looking through some of the original inferences from this work, I was seeing good inferences under the transporter hierarchy. Maybe these would be also correctly classified in the DP work. I was also wondering about the (in) versus (out) differentia. We have an existing imports relation, but it might not suffice. Perhaps open another ticket to specifically look at transporters.

goodb commented 6 years ago

Right now, there is no OWL representation in the axioms I have generated for the "in" and "out" you see in the RHEA transport reactions. The definition for A (in) = A (out) is the same as it is for A = A. I suspect this is something we could improve on with some more thought.

A related issue came up before for the translocation reactions coming into Noctua from Reactome, many of which are currently not classified in GO. The idea there was to use the occurs_in information in Reactome to generate 'has target start/end location' statements on the reaction entities that would then work with existing axioms to get classifications into biological process terms. E.g.,

R instance_of ‘establishment of protein localization’ R has_target_end_location location2 R has_target_start_location location1 <= R has output P2,
P2 occurs_in location2, R has input P1 P1 occurs_in location1 P1 = P2 location1 != location2

Then the OWL reasoner should add more specific biological process - e.g. R instance_of ‘establishment of protein localization to mitochondrial membrane’ Based on its definition: 'establishment of protein localization' and ('has target end location' some 'mitochondrial membrane')

I think the transporter / translocation area could be a really nice result from the rhea/reactome/go integration. We get the more specific information about locations from Reactome, the class structure with extant OWL definitions from GO, and the expanded set of transport reactions from rhea. I think worthy of another ticket @ukemi .

goodb commented 6 years ago

If anyone is looking at this, its worth noting that the expanded CHEBI import did not include the General class axioms for the new additional CHEBI terms. Hence, e.g., diphosphoric acid (CHEBI_29888) is not equated with diphosphate(3-) (CHEBI_33019) in the merged ontology.

ukemi commented 6 years ago

Isn't that part of the make_file? We can talk about it on Monday, but it makes sense to me to go ahead and run the ChEBI import with the additional terms we will need for the Rhea defs and if it all looks ok go ahead and merge that into master. It won't hurt anything to have the additional ChEBI classes I don't think. It's one more step we can do to get concrete progress along the way.

goodb commented 6 years ago

Yes, I assume it is part of the make file and ought to work fine when built that way. I haven't got set up to do the complete build locally (todo list..) and thus merged the Robot-generated chebi extract manually into Protege - thus missing the generation of those axioms. It probably won't make a lot of difference to our discussion, just may end up missing a few more inferences.

On Fri, Jun 29, 2018 at 12:02 PM, David Hill notifications@github.com wrote:

Isn't that part of the make_file? We can talk about it on Monday, but it makes sense to me to go ahead and run the ChEBI import with the additional terms we will need for the Rhea defs and if it all looks ok go ahead and merge that into master. It won't hurt anything to have the additional ChEBI classes I don't think. It's one more step we can do to get concrete progress along the way.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/14984#issuecomment-401445100, or mute the thread https://github.com/notifications/unsubscribe-auth/AB_U6gExU32RPZsosYUabhRBPIkOsJy8ks5uBnmsgaJpZM4Rykqi .

goodb commented 6 years ago

Here is one more incarnation to consider for the discussion on Monday. If we walk all the way back to definitions that looks like this (for adenylate cyclase activity): equivalent to: 'catalytic activity' and (('has input' some 'ATP(4-)')) and (('has output' some 'diphosphate(3-)') and ('has output' some '3',5'-cyclic AMP(1-)'))

Then things start looking interesting.

  1. the current Reactome->go-cam reactions get classified correctly. (At least they do by ELK, Arachne in Protege does not seem to work on this for some reason).
  2. we get 534 new direct subclass relations (an increase from 126 in the previous structure). Many of these are non-obvious (to me) because things start intersecting with existing definitions that make use of has input/output constraints.

This does not take the concept of bidirectionality into account, it is just one of the possible directions, but seems to behave mostly the way we want it to, is much easier to look at it, and fits in better with the rest of the ontology. It may be worth considering taking these directional structures (as they are laid out by default in Rhea) as a starting point (that, though potentially incomplete, is not wrong) and then filling in additional classes for the other directions as they are needed.

GO_Simple.zip

goodb commented 6 years ago

Summarizing status here (and adding some documentation of things happening off github).

We are still faced with the challenge of how best to add axioms to define the classes under Catalytic Activity. The principle challenge remains the bidirectional nature of most chemical reactions. This leads naturally to logical constructs that use an OR statement to join the Left-to-right with the Right-to-left. Unfortunately, though sound, this family of definitions does not work with any reasoner that can classify the whole GO. See thread on the ELK reasoner repo about this.

For the problem of inferring the class hierarchy, we have two representations on the table, both of which generate some subclass inferences people aren't sure about. See the spreadsheet where these are laid out for the two formulations. The two formulations are

For the problem of inferring the classifications for instances @cmungall suggested a pattern that works nicely using General Concept Inclusion (GCI) axioms. (This does not however influence the problem of class hierarchy inference.) As an example, consider the class ‘phosphoglycerate mutase activity’, which has the textual definition: “Catalysis of the reaction: 2-phospho-D-glycerate = 3-phospho-D-glycerate”. We add the following GCI axiom (and its reverse direction by switching inputs and outputs):

'catalytic activity' and (('has input' some '2-phosphonato-D-glycerate(3-)')) and (('has output' some '3-phosphonato-D-glycerate(3-)')) =>SubClassOf 'phosphoglycerate mutase activity'

Now, when ingesting, for example, the Gluconeogenesis pathway from Reactome, its component reaction ‘2-Phospho-D-glycerate <=> 3-Phospho-D-glycerate’ is correctly and automatically identified as an instance of the GO class ‘phosphoglycerate mutase activity’ based on its inputs and outputs. This example recapitulates a manually assigned GO term from Reactome.

Testing with all 11542 reactions imported from Reactome into GO-CAMs (May 2018), these axioms allow for the automatic classification of 2339 (20%). This is an increase from 794 when using the previous GOPlus without the new GCI axioms. 287 of the classifications are exact recapitulations of manual annotations, the remainder are potential new annotations that should be verified. Note that they may be sub or superclasses of existing annotations - only exact matches are tested for currently.

Examples of terms used in exact recapitulations:

Examples of new terms used for potential new annotations:

**Note that the GCI definitions require the presence of an assertion to type Catalytic Activity. These are not present in the Reactome data. To produce the above statistics, I used the rule ‘if the reaction has inputs {A,B..} and outputs {C, D..}, and at least one A is a CHEBI term and one C is a CHEBI term and {A,B} is not equal to {C, D} then add Catalytic Activity.

For more information about the impact of the GCI axioms on instance classifications from the Reactome import see: Catalytic_GCI_reactome_term_counts.txt ELK_reactome_new_mfdef_types.txt

To see all the GCI axioms brought in for terms xrefed to RHEA, see GO_Just_GCI_test.ttl.zip (Also has new complete version of chebi_import merged)

goodb commented 6 years ago

@ukemi Just to keep things in one place here Here are the links to the merged ontologies containing different logical definitions Just GCIs for instance classification Simple inputs and outputs (unidirectional) Sophisticated intersection pattern that respects directionality

and all files needed for review, including the GCI inferences, have been added into worksheets in this google spreadsheet (same referenced above)

ukemi commented 6 years ago

Thanks @goodb. To recap yesterday's discussion with our plan of action. @deustp01 , @hdrabkin and @ukemi will begin with a sanity check of the spreadsheets as a first-pass of the new inferences. We will indicate on the spreadsheet which inferences are correct and which are questionable. Once we have done a pass through of the spreadsheet, we will look at the reasoning behind the questionable inferences and try to determine what is causing the questionable inferences. By the time you return we should be able to provide you with a report of which methods we think are best and whether or not any tweaking is needed. As we meet, we will add our findings to this issue.

deustp01 commented 5 years ago

Next steps for Rhea - GO- Reactome roundtrip Each Reactome reaction instance that has a catalyst (or transporter) activity attribute maps to a single GO molecular function term, and from that term to a set of the four Rhea reactions that represent the four possible directions of the molecular transformation enabled by the GO activity. Those mappings can be used to find discrepancies in stoichiometry, participation of water and protons, and ionization states and stereochemistries, between the Rhea and Reactome versions of reactions, in a form that should allow the Reactome reactions to be edited to conform to their Rhea counterparts with minimal manual intervention.

To do this, can we build a table that, for each Reactome reaction (strictly, reactionlike event) that has a catalyst activity attribute, lists its identifier, the GO molecular function term extracted from the catalyst activity attribute, and the Rhea master reaction cross-referenced to that GO molecular function term. Are there legitimate reasons for any Reactome-to-GO or GO-to-Rhea mappings to be other than one to one?

With that table, we will be able to retrieve the lists of molecules associated with the Rhea and Reactome versions of the event and their stoichiometries, align them, and identify discrepancies. Will the tables already constructed by GO (GOCHE?) be useful here for making alignments when Rhea and Reactome disagree on charge state or stereochemistry?

Can the pathway context of each Reactome reaction be used to identify the direction of that reaction when the reaction is part of the corresponding biological process?

cmungall commented 5 years ago

Reactome-to-GO can be many to one, since each reactome ID represents an instance. However, it should not be one to many.

GO-to-RHEA should be one-to-one. We should implement a check for this.

There may be some leaf GO MFs that lack a RHEA. We can request these. There will be some RHEAs that have no equivalent in GO. We can add these manually for now on an as-needed basis, but we will later implement a system where this is semi-automated.

On Wed, Apr 24, 2019 at 8:47 AM deustp01 notifications@github.com wrote:

Next steps for Rhea - GO- Reactome roundtrip Each Reactome reaction instance that has a catalyst (or transporter) activity attribute maps to a single GO molecular function term, and from that term to a set of the four Rhea reactions that represent the four possible directions of the molecular transformation enabled by the GO activity. Those mappings can be used to find discrepancies in stoichiometry, participation of water and protons, and ionization states and stereochemistries, between the Rhea and Reactome versions of reactions, in a form that should allow the Reactome reactions to be edited to conform to their Rhea counterparts with minimal manual intervention.

To do this, can we build a table that, for each Reactome reaction (strictly, reactionlike event) that has a catalyst activity attribute, lists its identifier, the GO molecular function term extracted from the catalyst activity attribute, and the Rhea master reaction cross-referenced to that GO molecular function term. Are there legitimate reasons for any Reactome-to-GO or GO-to-Rhea mappings to be other than one to one?

With that table, we will be able to retrieve the lists of molecules associated with the Rhea and Reactome versions of the event and their stoichiometries, align them, and identify discrepancies. Will the tables already constructed by GO (GOCHE?) be useful here for making alignments when Rhea and Reactome disagree on charge state or stereochemistry?

Can the pathway context of each Reactome reaction be used to identify the direction of that reaction when the reaction is part of the corresponding biological process?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/14984#issuecomment-486299659, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMON7BJRHTO2DTWYSFULPSB6KFANCNFSM4EOKJKRA .

goodb commented 5 years ago

Resolution is to use the simpler construct that uses ObjectUnionOf (forward, backward) that inspired this request to the ELK team https://github.com/liveontologies/elk-reasoner/issues/54 . @balhoff has a solution.

goodb commented 5 years ago

Axioms to be added to a distinct file go-mf-defs.owl that will be imported into go-plus.
Will also generate a rhea.owl file which will be used for example to propagate xrefs.

balhoff commented 4 years ago

@cmungall if we go with the union-based approach, is there any reason we need the intermediate "substance sets/bags" for inputs and outputs? Am I forgetting something? I think this would work:

(
(catalytic_activity 
    and (has_input some (CHEBI_1 and has_stoich value “2”)) 
    and (has_input some (CHEBI_2 and has_stoich value “1”)) 
    and (has_output some (CHEBI_3 and has_stoich value “2”)) 
    and (has_output some (CHEBI_4 and has_stoich value “1”)))
or
(catalytic_activity 
    and (has_output some (CHEBI_1 and has_stoich value “2”)) 
    and (has_output some (CHEBI_2 and has_stoich value “1”)) 
    and (has_input some (CHEBI_3 and has_stoich value “2”)) 
    and (has_input some (CHEBI_4 and has_stoich value “1”)))
)
cmungall commented 4 years ago

I think this works

On Wed, May 6, 2020 at 12:59 PM Jim Balhoff notifications@github.com wrote:

@cmungall https://github.com/cmungall if we go with the union-based approach, is there any reason we need the intermediate "substance sets/bags" for inputs and outputs? Am I forgetting something? I think this would work:

(

(catalytic_activity

and (has_input some (CHEBI_1 and has_stoich value “2”))

and (has_input some (CHEBI_2 and has_stoich value “1”))

and (has_output some (CHEBI_3 and has_stoich value “2”))

and (has_output some (CHEBI_4 and has_stoich value “1”)))

or

(catalytic_activity

and (has_output some (CHEBI_1 and has_stoich value “2”))

and (has_output some (CHEBI_2 and has_stoich value “1”))

and (has_input some (CHEBI_3 and has_stoich value “2”))

and (has_input some (CHEBI_4 and has_stoich value “1”)))

)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/14984#issuecomment-624856160, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOKFTRPSFHIZ3EKAY6DRQG56ZANCNFSM4EOKJKRA .

amorgat commented 4 years ago

If I understand correctly, input/output is equivalent to substrate/product, right? i.e a directed reaction. As the mapping go2rhea is done on undirected reactions, do you envisage to provide links to Rhea directed reactions too? See examples in #19371

goodb commented 4 years ago

@amorgat the goal of the definition above is to capture the meaning of the undirected reaction - the union groups both directions into one class. I believe the intent is to limit the mapping to the parent undirected reaction from rhea.

cmungall commented 1 year ago

Current status:

I think this is sufficient. Adding logical definitions for grouping reactions outside what can be done in OWL