geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Reconsider Binding related conversion rules #98

Closed goodb closed 4 years ago

goodb commented 4 years ago

Conversations in the manuscript document indicate a need to reconsider some rules. https://docs.google.com/document/d/1QKVWxWkcbTh-A6VMzidusOS-CIP_SD1yEfAzWR5GiGs/edit?ts=5ef52611#

goodb commented 4 years ago

We have several comments from @thomaspd like this: translation rules: 4a "Reactome binding reactions aren't binding activities" 4b transport activities aren't activities unless enabled by a gene product.
7c "binding event that has no enabler and therefore isn't an activity"

As it stands, our conversion will sometimes produce OWL instances with RDF:types in the Molecular Function branch of the GO, even when the instance does not have an enabling entity linked to it. We have two choices:

1) we let things stand as they are for the manuscript. We report in the results section how many times we generate complete activities units and how many times it fails. Then we add text to the discussion about why this happens why it matters to the GO and how we might improve the models in the future.

2) We pause work on the manuscript. We make changes to the conversion code now to try to bring things closer to the ideal form. This could include filtering out reactions that don't have defined enablers or work on inferring enablers based on other information sources.

We need to decide before we continue work on the manuscript.

goodb commented 4 years ago

Note if we were to decide to pause and rework. We have discussed using or creating another ontology to provide classes for nodes that don't fit the Activity definition. There is some precedent for e.g. "biological spontaneous process" distinct from a "biological mediated process" (where the latter seems to map to GO Activity) in http://ceur-ws.org/Vol-2137/paper_31.pdf .

goodb commented 4 years ago

The essence of the problem here is the use of terms from the molecular function ontology without connecting them to gene products. In @thomaspd 's vision, it is semantically incorrect to use a term from this ontology to classify an instance unless there is an enabling gene product linked to that instance. In fact, he would support adding a necessary condition to the root of the MF ontology that would make this explicit and/or alter the cardinality in the existing shex specification from {0 or 1} as it is now to {1}.

For the Reactome conversion, he wants this semantic requirement presented clearly in the manuscript - noting that binding nodes (e.g. complex formation reactions) without enablers should not be typed as Molecular Function or any of its children. It would be possible to do this in the text, but perhaps the simplest way to make this explicit both in the manuscript and in the generated go-cams it to introduce a new ontology term to cover them. This would be something in between BFO:Occurrent and GO:MF - something along the lines of Cellular Event. If we did this, these nodes would be easy to detect visually in the diagrams and could be referred to more easily in the manuscript text. We would either add any needed terms to REACTO or identify these terms in some other ontology.

Feedback from the ontology team (e.g. @ukemi @vanaukenk @cmungall ) about how to proceed here would be useful. With a decision about what we want, implementation would be pretty straightforward.

ukemi commented 4 years ago

@thomaspd?

deustp01 commented 4 years ago

For the Reactome conversion, he wants this semantic requirement presented clearly in the manuscript - noting that binding nodes (e.g. complex formation reactions) without enablers should not be typed as Molecular Function or any of its children

Naive question - can we get some breathing space here by distinguishing cases where the enabler information does not exist (bad) from cases where it can't easily be recovered from the current data structure (maybe fix-able with a placeholder while we work in the next stage of the project on figuring out how to recover it reliably). My subjective hopeful guess is that most of the current problem cases fall in the second group: an enabler exists and can often be identified algorithmically, as suggested by @thomaspd observations in the manuscript of many places where the name assigned to an event by a curator has exactly this information (which, however, is not captured in the current formal annotation).

ukemi commented 4 years ago

Good point. If we are going to go down the route of ignoring binding reactions, how are we going to identify the ones that represent 'real' MFs like GO:0048018 (receptor ligand activity). I would propose that for now, any binding reaction that has at least one participant that is a biomacromolecule or complex is retained (are there any that dont?). I think we can even come up with simple rules to pick one as an enabler. These would in turn be used as indicators of future curation work, where we either make modification of how we represent them or convert them to more specific functions. Is this ok with you @thomaspd?

cmungall commented 4 years ago

Just a quick comment on the proposal to change the cardinality to {1}. This would make template BPs where we leave the GP open invalid. It would also prohibit models where we don't know the GP but know there is an MF.

I think it's a mistake to try and force this kind of thing into a closed world modeling framework / schema like ShEx.

I think that a proposal to make a biological constraints that every MF has an enabler is valid. This would be encoded in OWL (open world), as MF subClassOf enabledBy exactly 1 InformationMacromolecule. This would explicitly allow open/unknown/templates, and in these cases it would unambiguously state: there is an enabler here, we either don't know it or choose not to state it.

(we can still have metrics and scores where we can score a model for completeness, where each MF should have its enabler stated, completeness would be undesirable for templates but desirable for typical models, I just don't think the schema is the best place to do this)

cmungall commented 4 years ago

A minor variant of Paul's proposal. Instead of bringing in external ontologies, we just structure MF something like:

We could do the same thing for BP. I think this makes sense as we are currently using the term BP in a more restricted way than many people might think

thomaspd commented 4 years ago

Just so it's in this ticket, here's what I'd put in my email: Thanks, if everyone’s on board, creating new REACTO classes sounds like a good plan. I think it’s important to have Peter @deustp01 help us define what the ontology classes and structure should be. Based on what I recall from our discussions with Peter and earlier drafts of the paper, maybe something like:

Event (physico-chemical process)

Chemical reaction and translocation in Reactome are usually but not always catalyzed/controlled (i.e. they are enabled by a gene product), so we would only convert them to activities at this point when they have an enabler, and can use the REACTO term otherwise.

deustp01 commented 4 years ago

I think that a proposal to make a biological constraints that every MF has an enabler is valid. This would be encoded in OWL (open world), as MF subClassOf enabledBy exactly 1 InformationMacromolecule. This would explicitly allow open/unknown/templates, and in these cases it would unambiguously state: there is an enabler here, we either don't know it or choose not to state it.

This is a key point in @cmungall 's proposal from July 28, above. Almost always (I can think of exactly one reaction in all of Reactome that's an exception) an enabler exists; failure to annotate it as an attribute of a reaction reflects, as Chris says, lack of knowledge or lack of interest.

Even for binding, I can't think of a case where, say, subunit monomers and the heterodimer complex they can form co-exist in a steady state in a cell. Rather, one of the subunits is newly available (either newly made or newly released from an inactive or sequestered state) whereupon it binds the other to form a complex with emergent properties. That means that an enabler of the binding event and of the subsequent activity of the newly formed complex exists whether or not it is annotated. A technical annotation issue is that these events can involve conformational changes without any covalent modifications to the participating molecules (e.g., gated channels, or the change in a tyrosine kinase receptor enabled by ligand binding, that activates its kinase domain) and we don't have good tools for distinguishing and annotating alternative conformational states of otherwise identical proteins in a resource like Reactome.

Bottom line - the plan summarized by @thomaspd earlier today will create a lot of placeholders, but these essentially always represent lack of information, not lack of enablers.

Another bottom line - the list of four kinds of events - chemical reaction, binding, dissociation, translocation - looks complete. Okay @goodb ?

ukemi commented 4 years ago

Just to stir the pot a little more:

https://github.com/geneontology/go-ontology/issues/19838

ukemi commented 4 years ago

I think we could make rules to 'invoke' the enablers of binding reactions, based on what @deustp01 tells us above. For transport we have already agreed that they are all catalyzed by something and do not represent diffusion. Dissociation is still a question. Are they also placeholders for deeper functions as we expect binding reactions to be or is the above ticket with a little more thought the right track?

deustp01 commented 4 years ago

Dissociation is still a question.

My hunch is that the binding scenario (it may look spontaneous but isn't really) applies as well to dissociation - something happens to the complex that causes the affinities of the subunits for one another to fall, so it is enabled to come apart.

ukemi commented 4 years ago

Was there consensus on this at the end of the call yesterday?

deustp01 commented 4 years ago

close. We were not certain that Chris's and Paul's last proposals are in fact requesting the same thing. We were ready to go ahead with Paul's proposal for the four new terms to reside outside of GO MF, pending resolution of any Chris - Paul difference. @goodb will need to explain the remaining uncertainty here.

goodb commented 4 years ago

For the purposes of finishing this phase of the conversion, one new class "Molecular Event" will be introduced and used as a placeholder for reactions that are not provided with a GO MF annotation by Reactome.

We have one rule that would be impacted by this. It states "if the input of a binding reaction is the output of the previous event, then change the input to an enabler". I think this accounts for most of the enablers assigned to inferred binding reactions right now. If we like the intent of the rule, we could alter it to say "if an input of a molecular event is the output of the previous event, and the molecular event has more inputs then output, then change the input to an enabler and assign it the type Binding. If not, we can eliminate the inference.

An example reaction impacted by that rule is Interaction of SHP-1 or SHP-2 with phospho PD-1 (R-HSA-389759.1) which has one of its inputs assigned as its enabler because it is the output of the upstream reaction 'Phosphorylation of PD-1'. In this case the entity is a complex 'p-Y223,248-PDCD1:B7-DC,B7-H1' with no active unit.

goodb commented 4 years ago

I've made the change to 'molecular event' and for the moment kept the rule for inferring enablers in place. But, I didn't change the type of what is now a raw molecular event. If we change the type, we should apply the rule more broadly rather than only in that specific context. And, we should just do it via an OWL class definition.

goodb commented 4 years ago

Did this: https://github.com/geneontology/pathways2GO/pull/100 includes the elimination of the binding rules, the replacement of M.F. with M.E. , and the addition of a global rule for switching ME to MF with an enabler exists. Adds ME to REACTO build, currently as a subclass of BFO:Process.

goodb commented 4 years ago

@ukemi @vanaukenk @cmungall @deustp01 @thomaspd @kltm the noctua-dev models now reflect the results of the above discussions. See e.g. http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-983695 for examples of the new Molecular Event class and examples where this has been turned into Molecular Function because an enabler is present. Note we have no inference of binding or dissociation events included at this time.

ukemi commented 4 years ago

I may be a bit confused, but I see binding activities that only have inputs in the above model. gomodel:R-HSA-983695/e3a3c031-ff6d-49a8-b28d-c5743bd89e08

deustp01 commented 4 years ago

Is the "binding" unit in the lower left corner of this screenshot an example? (I'm confused too but mostly because I haven't paid attention to these extra units that appear in addition to the ones that correspond to Reactome reactions - which, here, do ll appear to have inputs and also outputs)?

Screen Shot 2020-09-01 at 12 09 23 PM
ukemi commented 4 years ago

Yes, this is what I was seeing. I don't remember seeing these in the other models I looked at.

goodb commented 4 years ago

@ukemi sorry for confusion. The binding nodes are all the result of the rule for dealing with entity-based regulation. (if you click on the green evidence square and look at the comment you will see they are all the same ("Entity Regulator Rule. The relation was added to account for an assertion about an entity regulating the target reaction.")

These are not direct translations of reactions like the other nodes in the model. My reference to the removal of the binding inference was to the other nodes, like 'p-PIK3AP1 (p-BCAP) binds PIK3CD:PIK3R1 (PI3K delta)' that were previously inferred to be binding and are now left to be Molecular Events as they have no enablers.

goodb commented 4 years ago

@ukemi links to the rules for these guys in here https://github.com/geneontology/pathways2GO/issues/99

ukemi commented 4 years ago

Ah, OK. Why is the binding in the above example not enabled by p=6Y-SYK? I thought that we had decided that in cases like this the enabler of the regulated function is the enabler of the binding. Maybe that decision was reversed at some point. So for example, PFK enzyme enabling the binding of ATP negatively regulates the phosphofructokinase activity enabled by the PFK.

goodb commented 4 years ago

@ukemi we definitely went back and forth on that one several times. This is where we have ended up. See rule 8 in https://docs.google.com/document/d/16-pFFG5MuYY3vDPXeacSBcAv5KB8WxQCE2S_9Ha4K-E/edit#

goodb commented 4 years ago

@ukemi is this the correct representation for entity regulation or do we need to go back and change the rule to add the enabler back in? (We certainly had it that way at one point.) Let me know as I'm in the middle of refreshing the numbers and would rather not repeat that process again if I don't need to.

ukemi commented 4 years ago

@thomaspd? @cmungall?

goodb commented 4 years ago

@ukemi @vanaukenk what is the resolution for the last question here. When a Binding activity node is introduced to account for regulation by a molecule, as in this picture https://github.com/geneontology/pathways2GO/issues/98#issuecomment-684968955 , and the activity node that is being regulated by the binding node has one of the inputs to the binding reaction as an enabler, should the binding node be enabled by that same enabling molecule or should it (as it is now) just have both molecules as inputs?

Lets decide and freeze.. Almost there.

vanaukenk commented 4 years ago

@thomaspd What are your thoughts on this one? This is another GO-CAM annotation question that we need to have a general SOP for, including our MOD imports.

goodb commented 4 years ago

resolved in meeting today.

Nodes introduced to account for regulation of a reaction (often by a small molecule but also by complexes) will be typed as Binding if the reaction they regulate has an enabler assigned and this enabler will also be added as an enabler (not an input) of the binding reaction. If there is no enabler, the new node will be typed with Molecular Event and only have the regulating molecule as an input.