geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
222 stars 40 forks source link

Using Rhea to populate logical definitions of reactions #14984

Closed pgaudet closed 1 year ago

pgaudet commented 6 years ago

Hello,

Discussing with @amorgat about how Rhea and GO represent biochemical reactions, here are a few points to consider:

I am putting this here for discussion - @ukemi I let you decide how soon we need to discuss this.

Thanks, Pascale

ukemi commented 6 years ago

I looked into this some time ago and creating the logical defs is not as simple as it looks on the surface. However, I do think that we could assert participant relations.

Along with the directional issues, there is also stoichiometry to consider to make necessary and sufficient definitions. We don't capture stoichiometry in GO.

Most MF terms are defined as being bidirectional. I think our best bet would be to get the directionality from GO-CAM models. There is also an issue with how we have defined processes such as catabolism and biosynthesis. Both have inputs and outputs as differentia. This causes problems because the things being catabolized aren't the only inputs to the process and the things being made aren't the only outputs. See #11779

cmungall commented 6 years ago

Re: stoichiometry. We actually do stoichiometry in that it figures into the (currently textual) definition.

I proposed a way of handling the stoi in the OWL and axiomatizing RHEA at the last barharbord mtg: https://docs.google.com/presentation/d/1QZ96mL1PRE0cLw0pPT5K-R9wdfd07HM4b2OFpCSSELU/edit#slide=id.p17 https://drive.google.com/drive/u/0/folders/0B8kRPmmvPJU3ZFVCb1RCUVFjYTQ

The assumption then was that we would make the GO classes equivalent to the bidi form, but we can revisit that

deustp01 commented 6 years ago

Dealing with charge states - RHEA uses the ChEBI instance that is predominant at pH 7.2, GO is indifferent - will also require some sort of mapping but this is eessentially the mapping already done to aline ChEBI with GO, not a new one.

pgaudet commented 6 years ago

And consistent with MetaCyc.

amorgat commented 6 years ago

pH 7.3 ;-) we call them 'normalized compounds' and can give you the mapping between any chebi and its normalized counterpart

deustp01 commented 6 years ago

@amorgat, could I get a copy of that mapping? It's time to align Reactome better with RHEA and this would let us do the work efficiently and increase the odds that we get the chemicals right. Thanks.

amorgat commented 6 years ago

Just a few precisions: We have 3 categories of reaction participants in Rhea small molecules: ChEBI entries polymers: linked to a ChEBI underlying polymer but with possibly different polymerization index (n+1, n-1, etc) generics: macromolecules are simplified to the functional groups involved in the reactions. Generics may have one or several residues. These residues are ChEBI entries. See publication https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384025

bmeldal commented 6 years ago

@amorgat Talking about mapping to RHEA, we also need to map to it from the Complex Portal, on the to-do list for later this year (?). We have EC numbers where available, and of course UniProt ACs. Would that help with mapping? If you want to discuss off-ticket, email me on bmeldal @ ebi. ac. uk :) Birgit

deustp01 commented 6 years ago

Hi Birgit, To the extent that you’re importing information about complexes from us, we should co-ordinate this – it would be easy on our side to make clean-up of small molecules relevant to you a priority. Peter

From: Birgit Meldal [mailto:notifications@github.com] Sent: Wednesday, January 31, 2018 9:56 AM To: geneontology/go-ontology Cc: D'Eustachio, Peter; Comment Subject: Re: [geneontology/go-ontology] Using Rhea to populate logical definitions of reactions (#14984)

@amorgathttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_amorgat&d=DwMFaQ&c=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE&r=FCJ8Ss3HV4Bv4t5pwSEyfEsEX_H7Qf0HYgfOzLSzd4g&m=rXlknEszp9KOInWNRczFolDrqgKsFxAygDcthHvv9Ug&s=W0WnRFndOZvfy2oaw7Y3X3cewO_gGasY60MiA4YCu3Q&e= Talking about mapping to RHEA, we also need to map to it from the Complex Portal, on the to-do list for later this year (?). We have EC numbers where available, and of course UniProt ACs. Would that help with mapping? If you want to discuss off-ticket, email me on bmeldal@ebi.ac.ukmailto:bmeldal@ebi.ac.uk :) Birgit

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geneontology_go-2Dontology_issues_14984-23issuecomment-2D361956731&d=DwMFaQ&c=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE&r=FCJ8Ss3HV4Bv4t5pwSEyfEsEX_H7Qf0HYgfOzLSzd4g&m=rXlknEszp9KOInWNRczFolDrqgKsFxAygDcthHvv9Ug&s=ZMWE6oVFbn3jFcUesk5P3y1q6Cho23l1Q5q30D-tBWA&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMJcV2jv-2Dv-2DNy2UoorF-5FfJWAQI-5FvunMUks5tQH7-2DgaJpZM4Rykqi&d=DwMFaQ&c=j5oPpO0eBH1iio48DtsedbOBGmuw5jHLjgvtN2r4ehE&r=FCJ8Ss3HV4Bv4t5pwSEyfEsEX_H7Qf0HYgfOzLSzd4g&m=rXlknEszp9KOInWNRczFolDrqgKsFxAygDcthHvv9Ug&s=6cndVn8Kn9-hr63pTwSTyzftUK6rt884Lp_2t9-K_hU&e=.


This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email.

bmeldal commented 6 years ago

@deustp01 We'd have to map to RHEA when a complex is catalyst for a certain reaction. It's less about the small molecules than the proteins. But happy to co-ordinate if the mapping suits both!

ukemi commented 6 years ago

Should we try to make this a discussion at the GOC meeting in May?

bmeldal commented 6 years ago

Sandra will be there for the CP (as well as UniProt, of course!), I have training responsibilities here that week.

alanbridge commented 6 years ago

I’ll be at the GO meeting in NYC and would be happy to discuss on behalf of the Rhea team (and UniProt).

(As an aside, we have used directional reactions for the construction and annotation of www.swisslipids.org).

ukemi commented 6 years ago

Ive added it to the GOC meeting agenda on the wiki.

sandraorchard commented 6 years ago

Hi, yes happy to represent the Complex Portal (and also UniProt) in these discussions. IntAct is also looking at directionality and working with SIGNOR (https://signor.uniroma2.it/) on an export for this.

cmungall commented 6 years ago

We're going to discuss this on the @geneontology/ontology call on monday

pgaudet commented 6 years ago

Editor's discussion:

PLAN

goodb commented 6 years ago

I’d like to propose that this issue be broadened to something along the lines of “define an OWL-DL pattern for representing molecular function terms based on the inputs and the outputs of their associated reactions”. If this structure can be agreed upon and implemented (at least for terms under catalysis), it would greatly help with the goal of expanding the MF ontology based on reactions from e.g. Rhea, but also Reactome, the EC, etc. It would also facilitate automated import of annotations from such databases into GO-CAMs because not all reactions in Reactome or other pathway collections are manually mapped to GO terms. With logical definitions based on shared components (chebi terms, uniprot), many could be inferred automatically. As mentioned above, @cmungall has a start on a logical structure. How does that look?

goodb commented 6 years ago

I built a strawman ontology based on @cmungall 's proposed OWL structure for commentary.

goodb commented 6 years ago

Updated the strawman to include another potential pattern that uses an anonymous class description (suggested by @balhoff ). Two versions of Individuals instantiating the new class are provided - one with all anonymous type description and one based on making multiple linked individuals. Note that all of these versions do implement stoichiometry but do not implement directionality. I think some form of directionality should be captured to support pathfinding queries and match biologist (maybe not chemist) expectations. This could be done in a strong way (that would be useful in reasoning) with has_input and has_output properties. If the reasoning consequences are not desired, one way to do this without committing formally to one substance set being the input and one the output might be to simply use the BioPAX convention of a 'left' property and a 'right' property. Hints that could be accessed programmatically but would not impact reasoning could be added with another data property indicating 'typical_direction' = {left to right, right to left, bidirectional}.

goodb commented 6 years ago

Here is what the anonymous version looks like in Manchester syntax when the existing URIs are mapped over to CHEBI and GO. (New ones are left as Protege default placeholders).

Class: http://purl.obolibrary.org/obo/go/extensions/go-plus.owl#RHEA_13032

EquivalentTo: 
    obo:GO_0050122,
    obo:GO_0003824
     and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_substance_bag> some 
        (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#SubstanceSet>
         and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_member_part> some 
            (obo:CHEBI_15377
             and (<http://www.biopax.org/release/biopax-level3.owl#has_stoichiometry> value 1)))
         and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_member_part> some 
            (obo:CHEBI_15379
             and (<http://www.biopax.org/release/biopax-level3.owl#has_stoichiometry> value 1)))
         and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_member_part> some 
            (obo:CHEBI_506227
             and (<http://www.biopax.org/release/biopax-level3.owl#has_stoichiometry> value 1)))))
     and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_substance_bag> some 
        (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#SubstanceSet>
         and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_member_part> some 
            (obo:CHEBI_15378
             and (<http://www.biopax.org/release/biopax-level3.owl#has_stoichiometry> value 1)))
         and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_member_part> some 
            (obo:CHEBI_16240
             and (<http://www.biopax.org/release/biopax-level3.owl#has_stoichiometry> value 1)))
         and (<http://www.semanticweb.org/bgood/ontologies/2018/4/untitled-ontology-147#has_member_part> some 
            (obo:CHEBI_38439
             and (<http://www.biopax.org/release/biopax-level3.owl#has_stoichiometry> value 1)))))
cmungall commented 6 years ago

looks good

What is our reasoning strategy? are values in the Elk profile?

Directionality - I like the exploratory strategy. Ultimately the editors (@ukemi @hdrabkin) should make the call, I thought originally on a per-branch basis (e.g. default direction-neutral, with some branches like kinase being directional. kinase is interesting as we might actually want 2 owldefs)

balhoff commented 6 years ago

In Protégé ELK says: "ELK supports DataHasValue only partially. Reasoning might be incomplete!"

I'm not sure exactly what it supports, but it appears to successfully classify has_member_part some (N-acetyl-D-glucosamine and (has_stoichiometry value 1)) as a subclass of has_member_part some N-acetyl-D-glucosamine

balhoff commented 6 years ago

On the other hand, I looked at using cardinalities, which could provide more error checking with a reasoner that supports them. But they are limited to integers (maybe okay here) and tend to be bad for reasoner performance. Most importantly for us, ELK does NOT classify has_member_part exactly 1 N-acetyl-D-glucosamine as a subclass of has_member_part some N-acetyl-D-glucosamine, which seems like a bad limitation.

goodb commented 6 years ago

I also tested an instance classification and does work as expected (ELK 0.4.3 in Protege). Tested with the go-plus merged ontology loaded and the class definition added as above (with the (has_stoichiometry value 1) pattern). Classified almost instantly. The reported DL expressivity for the combined ontology is listed as SRIQ and when the new class definition is added becomes SRIQ(D) because of the use of the data property. It seems not to be a problem, but if it were there are ways we could hack it - e.g. with an object property and a list of Individuals representing stoichiometric coefficients...

goodb commented 6 years ago

Is there an opportunity to do something similar to the above for the 'Binding' subtree. If we could add logical definitions for terms like 'CD70 receptor binding' - just as there are for siblings like 'aryl hydrocarbon receptor binding' - it would once again make it much easier and more consistent to automatically merge in content from other data sources.

edit: see previous work on the binding issue by @cmungall https://drive.google.com/drive/u/0/folders/0B8kRPmmvPJU3U2YyMG0zYWVqTVk and http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/experimental/domo/

goodb commented 6 years ago

newMFsFromRhea.ttl.zip Here is an ontology containing 2578 logical definitions built for GO terms with existing xrefs to RHEA reactions. It can be merged into GoPlus without any problem that I can see (via ELK). It uses the 2 bag, no directionality approach discussed above. Comments would be grand.

cmungall commented 6 years ago

can we do a test to see

  1. how many existing asserted classifications are recapitulated using these axioms
  2. what are the new inferences we get

For 1, we'd combine go-plus plus the rhea axioms, there is a new command in owltools to do this https://github.com/owlcollab/owltools/pull/259 (but it may be faster for you to do the logic yourself than get entwined in owltools).

For 2 we could have some of the editors check them

goodb commented 6 years ago

Yep working on it. It looks like we will also need to expand the chebi import that is included in GOPlus as many of the terms used in rhea are not there now. Not sure if those definitions will impact classification.

ukemi commented 6 years ago

Assigining myself to this because it makes a great deal of sense for @deustp01 and I to work on this while we are sitting relatively close to each other. He arrives here in a couple of weeks. I think by the end of the Summer we can make some real progress in the alignment.

goodb commented 6 years ago

Hi @ukemi . Great. Since I am working on this right now, how can I best support your work? What kind of reports/data/examples etc. do you need generated to help you make decisions on this? (I ought to have what @cmungall requested ready by next week).

ukemi commented 6 years ago

Hi @goodb. I haven't looked at what you have generated closely enough to comment. I may get a chance this weekend. Perhaps you and @deustp01 and I can touch base on a conference call next week. I'd like to put an action plan in place with concrete steps toward the completion of the project- clearly define the goals etc. I think @cmungall 's requests above are necessary and I think beyond the classification that is available, we might want to think out of the box about other ways to classify MFs. So far GO has relied heavily on EC, but this opens up a lot of new doors.

I haven't looked at your logical def file, but is it in obo format? That would be the easiest way for me to quickly eyeball it.

cmungall commented 6 years ago

it's not translateable to obo due to the nesting. But it's a direct translation of the RHEA so it won't impart any additional information you won't get from looking at RHEA. The informative part will be looking at the inferences that comes from this. We can come up with a report and/or you can explore this in Protege

goodb commented 6 years ago

The zip file above is in .ttl format which will open up in Protege without an issue. I tried to export it as an obo file from there but Protege choked on it - presumably for the issue Chris raises there.

goodb commented 6 years ago

I found an error in the axiom pattern. For example, ‘dimethylallylcistransferase activity’ is inferred to be a subclass of ‘isopentenyl-diphosphate delta-isomerase activity’ as follows:

‘dimethylallylcistransferase activity’ is equivalent to anything with one substance bag containing: {isopentenyl diphosphate(3-) & prenyl diphosphate(3-)} and another substance bag containing: {diphosphate(3-) & neryl diphosphate(3-)}

And ‘isopentenyl-diphosphate delta-isomerase activity’ is equivalent to anything with one substance bag containing: ‘isopentenyl diphosphate(3-)’ and another bag containing ‘prenyl diphosphate(3-)’.

Contents present in both substance bags on the superclass are matching the contents of one bag on the subclass. I don't think that is right.

Need to add something to the definition to indicate that the two bags are different. @balhoff any ideas ? Could solve this with a left/right or input/output side indicator as suggested above.

goodb commented 6 years ago

One option here is to replace the 'has_substance_bag' property with 'has_left_substance_bag' and 'has_right_substance_bag'. If its possible to know which side is left and which is right, this works just fine. If it is not possible, we could assert the definition to be the ObjectUnionOf the left2right definition and the right2left definition. Unfortunately the ELK reasoner does not support classification using the ObjectUnionOf construct.

Suppose its rather ugly, but we could introduce 2 separate classes for each reaction, one for each direction. That would again solve the problem.

ukemi commented 6 years ago

Ah. I suspected we might run into this. I'm not sure that left and right are always identifiable in the 'reversible' reactions. We have thought about the idea of introducing the forward and reverse directions for the reactions as subclasses. As we move more towards instance representation in GOCAMs, this might be the best solution. Rhea has the directions as children of the reversible reactions and I think this would make the right sense for Reactome.

Any way we could have a conference call soon just to touch base on everything?

ukemi commented 6 years ago

PS. I'm also a bit concerned about creating definitions from these. Will they be necessary and sufficient for complete classification? Do we have differentia axes in GO that are by reaction mechanism rather than participants? I think so. In that case it may be better to make asserted relationships instead of logical defs.... or we need to pick the genus for sets of terms quite carefully.

goodb commented 6 years ago

@ukemi happy to talk about this. Lets go to email to set up a time.

goodb commented 6 years ago

For consideration, there are some discrepancies in mappings between RHEA, GO, and Reactome in terms of both which RHEA version is xreffed and which CHEBIs are used. Ping @deustp01 BTW, there are 4 reactions in the RHEA model - a "master" reaction which does not specify direction, an intentionally bidirectional, left to right, and right to left. GO always seems to link to the intentionally bidirectional reaction - though it seems like it might be more appropriate to link to the Master reaction.

screen shot 2018-06-13 at 8 21 52 am screen shot 2018-06-13 at 8 22 09 am

Edit: Noting that the GO xref mappings to Reactome identifiers like REACT_1292 are obsolete (by at least 10 reactome releases).
See editable, commentable images https://docs.google.com/presentation/d/1m_ZHO9DOm5ITQ9MZ6FzaReI8VPjmz8mAWtXfvRLDDkA/edit#slide=id.g3b84a54a95_0_14

balhoff commented 6 years ago

@goodb I'm not coming up with any good ideas that work with ELK.

goodb commented 6 years ago

@balhoff do think SWRL is an option here?

deustp01 commented 6 years ago

DIRECTIONALITY - I think it's an enzymologist / biochemist convention to treat all reactions as intentionally bidirectional, allowing for reactions to be driven one way or the other by big enough differences in amounts of the various participating molecules. I think that EC follows this convention and GO, when it picked up the beginnings of a catalysis branch of molecular_function from EC, picked up the convention as part of the package. (@mah11 do you remember this history?) If that's true, it might well do no damage to GO and existing GO - EC mappings to switch from intentionally bidirectional to master.

RHEA - Reactome mapping - the one to R-HSA-442715 looks weird because the Reactome event lumps an enzyme activation step and catalysis by the activated enzyme, but the catalyzed reaction is correctly mapped, so this is either OK or a use case for the application of RHEA - Reactome mapping as a QA tool for identifying irregularly annotated reactions. The REACT_1292, etc. identifiers are obsolete (wonder how they turned up in your data set for mapping) but all correspond to Reactome reactions that describe simple conversion of ATP to cAMP + PPi, so you're right - mapping to RHEA 15389 would be correct: REACT_1292 = R-HSA-164377 REACT_15399 = R-HSA-170676 REACT_19249 = R-HSA-392129. Also, searching in Reactome for reactions whose catalystActivity attribute uses GO:00049016 (i.e., instances where we are asserting that an enzyme has this molecular function) turned up three more Reactome reactions that convert ATP to cAMP + PPi : R-HSA-381607 R-HSA-5211224 R-HSA-5610727 Would you have expected your process to turn these up?

balhoff commented 6 years ago

@goodb I don't think SWRL would really fit into the ontology classification reasoning.

goodb commented 6 years ago

@balhoff what about instance classification? Looking at this from the perspective of getting individuals from Noctua models (e.g. reactions from Reactome) classified into GO I imagine it wouldn't be very hard to achieve this with a set of SWRL rules which could be generated automatically and executed with Jena or Arachne etc. Like you say, you wouldn't get subclass inferences out, but I'm not sure how much that matters? I guess that shifts the burden of organizing the MF classes back towards manually constructed definitions as they are now.

More generally, formalizing processing stages that act at the stage of instance classification (like this) and at the stage of data export like GORules and GPAD generation might really benefit from the use of a rule framework. (Meaning capturing the rules in a language like SWRL, deciding on an execution framework like Jena/Arachne/DROOLS/JESS, and adding these to build/export/etc. processes.)

ukemi commented 6 years ago

Hmmm. I think the greatest utility for us would be to be able to reason on these at the class level. If we performed these operations at the level of a Noctua model, how would we ever make the correct inference to a class if the axioms were not in place for the classes?

goodb commented 6 years ago

The axioms would live in a rule base maintained outside of the ontology. Arachne or other would be executed and generate instanceOf relationships between nodes in Noctua and classes. Its mainly a question of where the logic is held (in OWL class definitions or a set of SWRL rules) and which reasoning engines get executed at what phase in the cycle.

ukemi commented 6 years ago

15897 would be a good test.

goodb commented 6 years ago

@ukemi @deustp01 @vanaukenk from the discussion today, here is a file with the inferred subclass relationships we get from current logical definitions. For example,

phosphoribosylformylglycinamidine synthase activity subclass of ATPase activity methanol dehydrogenase activity subclass of alcohol dehydrogenase [NAD(P)+] activity

inferred_sublasses.txt

goodb commented 6 years ago

Summarizing action items from voice discussion today for posterity and thread watchers benefit. (May want to break some of these out into their own issues).

Regarding the CHEBI term alignment challenge, the intention is to build the GO definitions with the terms as RHEA does (which include charge state). The existing GO-CHEBI general class axioms (as described in https://www.ncbi.nlm.nih.gov/pubmed/23895341 ) should be enough to handle merging of the 'same' molecule with different charge states for the purposes of reasoning. We can test this with examples from Reactome. (and this may eventually be used as part of an update of Reactome to use the more specific chebi terms as they are in RHEA).