Closed pgaudet closed 1 year ago
I looked into this some time ago and creating the logical defs is not as simple as it looks on the surface. However, I do think that we could assert participant relations.
Along with the directional issues, there is also stoichiometry to consider to make necessary and sufficient definitions. We don't capture stoichiometry in GO.
Most MF terms are defined as being bidirectional. I think our best bet would be to get the directionality from GO-CAM models. There is also an issue with how we have defined processes such as catabolism and biosynthesis. Both have inputs and outputs as differentia. This causes problems because the things being catabolized aren't the only inputs to the process and the things being made aren't the only outputs. See #11779
Re: stoichiometry. We actually do stoichiometry in that it figures into the (currently textual) definition.
I proposed a way of handling the stoi in the OWL and axiomatizing RHEA at the last barharbord mtg: https://docs.google.com/presentation/d/1QZ96mL1PRE0cLw0pPT5K-R9wdfd07HM4b2OFpCSSELU/edit#slide=id.p17 https://drive.google.com/drive/u/0/folders/0B8kRPmmvPJU3ZFVCb1RCUVFjYTQ
The assumption then was that we would make the GO classes equivalent to the bidi form, but we can revisit that
Dealing with charge states - RHEA uses the ChEBI instance that is predominant at pH 7.2, GO is indifferent - will also require some sort of mapping but this is eessentially the mapping already done to aline ChEBI with GO, not a new one.
And consistent with MetaCyc.
pH 7.3 ;-) we call them 'normalized compounds' and can give you the mapping between any chebi and its normalized counterpart
@amorgat, could I get a copy of that mapping? It's time to align Reactome better with RHEA and this would let us do the work efficiently and increase the odds that we get the chemicals right. Thanks.
Just a few precisions: We have 3 categories of reaction participants in Rhea small molecules: ChEBI entries polymers: linked to a ChEBI underlying polymer but with possibly different polymerization index (n+1, n-1, etc) generics: macromolecules are simplified to the functional groups involved in the reactions. Generics may have one or several residues. These residues are ChEBI entries. See publication https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384025
@amorgat Talking about mapping to RHEA, we also need to map to it from the Complex Portal, on the to-do list for later this year (?). We have EC numbers where available, and of course UniProt ACs. Would that help with mapping? If you want to discuss off-ticket, email me on bmeldal @ ebi. ac. uk :) Birgit
Hi Birgit, To the extent that you’re importing information about complexes from us, we should co-ordinate this – it would be easy on our side to make clean-up of small molecules relevant to you a priority. Peter
From: Birgit Meldal [mailto:notifications@github.com] Sent: Wednesday, January 31, 2018 9:56 AM To: geneontology/go-ontology Cc: D'Eustachio, Peter; Comment Subject: Re: [geneontology/go-ontology] Using Rhea to populate logical definitions of reactions (#14984)
@amorgathttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_amorgat&d=DwMFaQ&c=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE&r=FCJ8Ss3HV4Bv4t5pwSEyfEsEX_H7Qf0HYgfOzLSzd4g&m=rXlknEszp9KOInWNRczFolDrqgKsFxAygDcthHvv9Ug&s=W0WnRFndOZvfy2oaw7Y3X3cewO_gGasY60MiA4YCu3Q&e= Talking about mapping to RHEA, we also need to map to it from the Complex Portal, on the to-do list for later this year (?). We have EC numbers where available, and of course UniProt ACs. Would that help with mapping? If you want to discuss off-ticket, email me on bmeldal@ebi.ac.ukmailto:bmeldal@ebi.ac.uk :) Birgit
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geneontology_go-2Dontology_issues_14984-23issuecomment-2D361956731&d=DwMFaQ&c=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE&r=FCJ8Ss3HV4Bv4t5pwSEyfEsEX_H7Qf0HYgfOzLSzd4g&m=rXlknEszp9KOInWNRczFolDrqgKsFxAygDcthHvv9Ug&s=ZMWE6oVFbn3jFcUesk5P3y1q6Cho23l1Q5q30D-tBWA&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMJcV2jv-2Dv-2DNy2UoorF-5FfJWAQI-5FvunMUks5tQH7-2DgaJpZM4Rykqi&d=DwMFaQ&c=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE&r=FCJ8Ss3HV4Bv4t5pwSEyfEsEX_H7Qf0HYgfOzLSzd4g&m=rXlknEszp9KOInWNRczFolDrqgKsFxAygDcthHvv9Ug&s=6cndVn8Kn9-hr63pTwSTyzftUK6rt884Lp_2t9-K_hU&e=.
@deustp01 We'd have to map to RHEA when a complex is catalyst for a certain reaction. It's less about the small molecules than the proteins. But happy to co-ordinate if the mapping suits both!
Should we try to make this a discussion at the GOC meeting in May?
Sandra will be there for the CP (as well as UniProt, of course!), I have training responsibilities here that week.
I’ll be at the GO meeting in NYC and would be happy to discuss on behalf of the Rhea team (and UniProt).
(As an aside, we have used directional reactions for the construction and annotation of www.swisslipids.org).
Ive added it to the GOC meeting agenda on the wiki.
Hi, yes happy to represent the Complex Portal (and also UniProt) in these discussions. IntAct is also looking at directionality and working with SIGNOR (https://signor.uniroma2.it/) on an export for this.
We're going to discuss this on the @geneontology/ontology call on monday
Editor's discussion:
PLAN
I’d like to propose that this issue be broadened to something along the lines of “define an OWL-DL pattern for representing molecular function terms based on the inputs and the outputs of their associated reactions”. If this structure can be agreed upon and implemented (at least for terms under catalysis), it would greatly help with the goal of expanding the MF ontology based on reactions from e.g. Rhea, but also Reactome, the EC, etc. It would also facilitate automated import of annotations from such databases into GO-CAMs because not all reactions in Reactome or other pathway collections are manually mapped to GO terms. With logical definitions based on shared components (chebi terms, uniprot), many could be inferred automatically. As mentioned above, @cmungall has a start on a logical structure. How does that look?
I built a strawman ontology based on @cmungall 's proposed OWL structure for commentary.
Updated the strawman to include another potential pattern that uses an anonymous class description (suggested by @balhoff ). Two versions of Individuals instantiating the new class are provided - one with all anonymous type description and one based on making multiple linked individuals. Note that all of these versions do implement stoichiometry but do not implement directionality. I think some form of directionality should be captured to support pathfinding queries and match biologist (maybe not chemist) expectations. This could be done in a strong way (that would be useful in reasoning) with has_input and has_output properties. If the reasoning consequences are not desired, one way to do this without committing formally to one substance set being the input and one the output might be to simply use the BioPAX convention of a 'left' property and a 'right' property. Hints that could be accessed programmatically but would not impact reasoning could be added with another data property indicating 'typical_direction' = {left to right, right to left, bidirectional}.
Here is what the anonymous version looks like in Manchester syntax when the existing URIs are mapped over to CHEBI and GO. (New ones are left as Protege default placeholders).
Class: http://purl.obolibrary.org/obo/go/extensions/go-plus.owl#RHEA_13032
EquivalentTo:
obo:GO_0050122,
obo:GO_0003824
and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_substance_bag> some
(<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#SubstanceSet>
and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_member_part> some
(obo:CHEBI_15377
and (<http://www.biopax.org/release/biopax-level3.owl#has_stoichiometry> value 1)))
and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_member_part> some
(obo:CHEBI_15379
and (<http://www.biopax.org/release/biopax-level3.owl#has_stoichiometry> value 1)))
and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_member_part> some
(obo:CHEBI_506227
and (<http://www.biopax.org/release/biopax-level3.owl#has_stoichiometry> value 1)))))
and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_substance_bag> some
(<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#SubstanceSet>
and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_member_part> some
(obo:CHEBI_15378
and (<http://www.biopax.org/release/biopax-level3.owl#has_stoichiometry> value 1)))
and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_member_part> some
(obo:CHEBI_16240
and (<http://www.biopax.org/release/biopax-level3.owl#has_stoichiometry> value 1)))
and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_member_part> some
(obo:CHEBI_38439
and (<http://www.biopax.org/release/biopax-level3.owl#has_stoichiometry> value 1)))))
looks good
What is our reasoning strategy? are values in the Elk profile?
Directionality - I like the exploratory strategy. Ultimately the editors (@ukemi @hdrabkin) should make the call, I thought originally on a per-branch basis (e.g. default direction-neutral, with some branches like kinase being directional. kinase is interesting as we might actually want 2 owldefs)
In Protégé ELK says: "ELK supports DataHasValue only partially. Reasoning might be incomplete!"
I'm not sure exactly what it supports, but it appears to successfully classify has_member_part some (N-acetyl-D-glucosamine and (has_stoichiometry value 1))
as a subclass of has_member_part some N-acetyl-D-glucosamine
On the other hand, I looked at using cardinalities, which could provide more error checking with a reasoner that supports them. But they are limited to integers (maybe okay here) and tend to be bad for reasoner performance. Most importantly for us, ELK does NOT classify has_member_part exactly 1 N-acetyl-D-glucosamine
as a subclass of has_member_part some N-acetyl-D-glucosamine
, which seems like a bad limitation.
I also tested an instance classification and does work as expected (ELK 0.4.3 in Protege). Tested with the go-plus merged ontology loaded and the class definition added as above (with the (has_stoichiometry value 1) pattern). Classified almost instantly. The reported DL expressivity for the combined ontology is listed as SRIQ and when the new class definition is added becomes SRIQ(D) because of the use of the data property. It seems not to be a problem, but if it were there are ways we could hack it - e.g. with an object property and a list of Individuals representing stoichiometric coefficients...
Is there an opportunity to do something similar to the above for the 'Binding' subtree. If we could add logical definitions for terms like 'CD70 receptor binding' - just as there are for siblings like 'aryl hydrocarbon receptor binding' - it would once again make it much easier and more consistent to automatically merge in content from other data sources.
edit: see previous work on the binding issue by @cmungall https://drive.google.com/drive/u/0/folders/0B8kRPmmvPJU3U2YyMG0zYWVqTVk and http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/experimental/domo/
newMFsFromRhea.ttl.zip Here is an ontology containing 2578 logical definitions built for GO terms with existing xrefs to RHEA reactions. It can be merged into GoPlus without any problem that I can see (via ELK). It uses the 2 bag, no directionality approach discussed above. Comments would be grand.
can we do a test to see
For 1, we'd combine go-plus plus the rhea axioms, there is a new command in owltools to do this https://github.com/owlcollab/owltools/pull/259 (but it may be faster for you to do the logic yourself than get entwined in owltools).
For 2 we could have some of the editors check them
Yep working on it. It looks like we will also need to expand the chebi import that is included in GOPlus as many of the terms used in rhea are not there now. Not sure if those definitions will impact classification.
Assigining myself to this because it makes a great deal of sense for @deustp01 and I to work on this while we are sitting relatively close to each other. He arrives here in a couple of weeks. I think by the end of the Summer we can make some real progress in the alignment.
Hi @ukemi . Great. Since I am working on this right now, how can I best support your work? What kind of reports/data/examples etc. do you need generated to help you make decisions on this? (I ought to have what @cmungall requested ready by next week).
Hi @goodb. I haven't looked at what you have generated closely enough to comment. I may get a chance this weekend. Perhaps you and @deustp01 and I can touch base on a conference call next week. I'd like to put an action plan in place with concrete steps toward the completion of the project- clearly define the goals etc. I think @cmungall 's requests above are necessary and I think beyond the classification that is available, we might want to think out of the box about other ways to classify MFs. So far GO has relied heavily on EC, but this opens up a lot of new doors.
I haven't looked at your logical def file, but is it in obo format? That would be the easiest way for me to quickly eyeball it.
it's not translateable to obo due to the nesting. But it's a direct translation of the RHEA so it won't impart any additional information you won't get from looking at RHEA. The informative part will be looking at the inferences that comes from this. We can come up with a report and/or you can explore this in Protege
The zip file above is in .ttl format which will open up in Protege without an issue. I tried to export it as an obo file from there but Protege choked on it - presumably for the issue Chris raises there.
I found an error in the axiom pattern. For example, ‘dimethylallylcistransferase activity’ is inferred to be a subclass of ‘isopentenyl-diphosphate delta-isomerase activity’ as follows:
‘dimethylallylcistransferase activity’ is equivalent to anything with one substance bag containing: {isopentenyl diphosphate(3-) & prenyl diphosphate(3-)} and another substance bag containing: {diphosphate(3-) & neryl diphosphate(3-)}
And ‘isopentenyl-diphosphate delta-isomerase activity’ is equivalent to anything with one substance bag containing: ‘isopentenyl diphosphate(3-)’ and another bag containing ‘prenyl diphosphate(3-)’.
Contents present in both substance bags on the superclass are matching the contents of one bag on the subclass. I don't think that is right.
Need to add something to the definition to indicate that the two bags are different. @balhoff any ideas ? Could solve this with a left/right or input/output side indicator as suggested above.
One option here is to replace the 'has_substance_bag' property with 'has_left_substance_bag' and 'has_right_substance_bag'. If its possible to know which side is left and which is right, this works just fine. If it is not possible, we could assert the definition to be the ObjectUnionOf the left2right definition and the right2left definition. Unfortunately the ELK reasoner does not support classification using the ObjectUnionOf construct.
Suppose its rather ugly, but we could introduce 2 separate classes for each reaction, one for each direction. That would again solve the problem.
Ah. I suspected we might run into this. I'm not sure that left and right are always identifiable in the 'reversible' reactions. We have thought about the idea of introducing the forward and reverse directions for the reactions as subclasses. As we move more towards instance representation in GOCAMs, this might be the best solution. Rhea has the directions as children of the reversible reactions and I think this would make the right sense for Reactome.
Any way we could have a conference call soon just to touch base on everything?
PS. I'm also a bit concerned about creating definitions from these. Will they be necessary and sufficient for complete classification? Do we have differentia axes in GO that are by reaction mechanism rather than participants? I think so. In that case it may be better to make asserted relationships instead of logical defs.... or we need to pick the genus for sets of terms quite carefully.
@ukemi happy to talk about this. Lets go to email to set up a time.
For consideration, there are some discrepancies in mappings between RHEA, GO, and Reactome in terms of both which RHEA version is xreffed and which CHEBIs are used. Ping @deustp01 BTW, there are 4 reactions in the RHEA model - a "master" reaction which does not specify direction, an intentionally bidirectional, left to right, and right to left. GO always seems to link to the intentionally bidirectional reaction - though it seems like it might be more appropriate to link to the Master reaction.
Edit: Noting that the GO xref mappings to Reactome identifiers like REACT_1292 are obsolete (by at least 10 reactome releases).
See editable, commentable images https://docs.google.com/presentation/d/1m_ZHO9DOm5ITQ9MZ6FzaReI8VPjmz8mAWtXfvRLDDkA/edit#slide=id.g3b84a54a95_0_14
@goodb I'm not coming up with any good ideas that work with ELK.
@balhoff do think SWRL is an option here?
DIRECTIONALITY - I think it's an enzymologist / biochemist convention to treat all reactions as intentionally bidirectional, allowing for reactions to be driven one way or the other by big enough differences in amounts of the various participating molecules. I think that EC follows this convention and GO, when it picked up the beginnings of a catalysis branch of molecular_function from EC, picked up the convention as part of the package. (@mah11 do you remember this history?) If that's true, it might well do no damage to GO and existing GO - EC mappings to switch from intentionally bidirectional to master.
RHEA - Reactome mapping - the one to R-HSA-442715 looks weird because the Reactome event lumps an enzyme activation step and catalysis by the activated enzyme, but the catalyzed reaction is correctly mapped, so this is either OK or a use case for the application of RHEA - Reactome mapping as a QA tool for identifying irregularly annotated reactions. The REACT_1292, etc. identifiers are obsolete (wonder how they turned up in your data set for mapping) but all correspond to Reactome reactions that describe simple conversion of ATP to cAMP + PPi, so you're right - mapping to RHEA 15389 would be correct: REACT_1292 = R-HSA-164377 REACT_15399 = R-HSA-170676 REACT_19249 = R-HSA-392129. Also, searching in Reactome for reactions whose catalystActivity attribute uses GO:00049016 (i.e., instances where we are asserting that an enzyme has this molecular function) turned up three more Reactome reactions that convert ATP to cAMP + PPi : R-HSA-381607 R-HSA-5211224 R-HSA-5610727 Would you have expected your process to turn these up?
@goodb I don't think SWRL would really fit into the ontology classification reasoning.
@balhoff what about instance classification? Looking at this from the perspective of getting individuals from Noctua models (e.g. reactions from Reactome) classified into GO I imagine it wouldn't be very hard to achieve this with a set of SWRL rules which could be generated automatically and executed with Jena or Arachne etc. Like you say, you wouldn't get subclass inferences out, but I'm not sure how much that matters? I guess that shifts the burden of organizing the MF classes back towards manually constructed definitions as they are now.
More generally, formalizing processing stages that act at the stage of instance classification (like this) and at the stage of data export like GORules and GPAD generation might really benefit from the use of a rule framework. (Meaning capturing the rules in a language like SWRL, deciding on an execution framework like Jena/Arachne/DROOLS/JESS, and adding these to build/export/etc. processes.)
Hmmm. I think the greatest utility for us would be to be able to reason on these at the class level. If we performed these operations at the level of a Noctua model, how would we ever make the correct inference to a class if the axioms were not in place for the classes?
The axioms would live in a rule base maintained outside of the ontology. Arachne or other would be executed and generate instanceOf relationships between nodes in Noctua and classes. Its mainly a question of where the logic is held (in OWL class definitions or a set of SWRL rules) and which reasoning engines get executed at what phase in the cycle.
@ukemi @deustp01 @vanaukenk from the discussion today, here is a file with the inferred subclass relationships we get from current logical definitions. For example,
phosphoribosylformylglycinamidine synthase activity subclass of ATPase activity methanol dehydrogenase activity subclass of alcohol dehydrogenase [NAD(P)+] activity
Summarizing action items from voice discussion today for posterity and thread watchers benefit. (May want to break some of these out into their own issues).
Regarding the CHEBI term alignment challenge, the intention is to build the GO definitions with the terms as RHEA does (which include charge state). The existing GO-CHEBI general class axioms (as described in https://www.ncbi.nlm.nih.gov/pubmed/23895341 ) should be enough to handle merging of the 'same' molecule with different charge states for the purposes of reasoning. We can test this with examples from Reactome. (and this may eventually be used as part of an update of Reactome to use the more specific chebi terms as they are in RHEA).
Hello,
Discussing with @amorgat about how Rhea and GO represent biochemical reactions, here are a few points to consider:
I am putting this here for discussion - @ukemi I let you decide how soon we need to discuss this.
Thanks, Pascale