geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

define and implement model for transport and dissociation processes #75

Closed goodb closed 4 years ago

goodb commented 5 years ago

At GOC 2019b in Berkeley, we briefly discussed providing these kinds of reactions with either more specific types, e.g. something along the lines of 'protein complex disassembly' for dissociation and, as done in some prior iterations, 'transport' or 'establishment of protein localization' for transport processes. Or, defining a more appropriate generic upper level type, like 'reaction', that would be a parent of molecular function and using that for both.

Need resolution here from ontology developers in particular -> ping @ukemi

Noting previous discussions on this topic: #17 #35 #55 #73

Examples (both from Signaling by BMP R-HSA-201451 ) : translocation Screen Shot 2019-10-05 at 3 26 00 PM

dissociation Screen Shot 2019-10-05 at 3 21 58 PM

ukemi commented 5 years ago

I think this is opened up for us now that we can use processes as causal intermediates in GO-CAM models. Let's discuss on the next editors' call.

transport: The directed movement of substances (such as macromolecules, small molecules, ions) or cellular components (such as complexes and organelles) into, out of or within a cell, or between cells, or within a multicellular organism by means of some agent such as a transporter, pore or motor protein.

establishment of localization: Any process that localizes a substance or cellular component. This may occur via movement, tethering or selective degradation. SEEMS LIKE THIS ONE FITS BETTER. THE ONTOLOGY IS SPARSE HERE, BUT WE SHOULD BE ABLE TO INFER THE END-POINT OF THE LOCALIZATION. IN THE FUTURE, WOULD IT MAKE SENSE TO INSTANTIATE TERMS LIKE 'ESTABLISHMENT OF LOCALIZATION TO XYZ'? WE SHOULD ALSO LOOK INTO PROPERTY CHAINS BASED ON THE MEMBERS OF THE COMPLEX.

protein-containing complex disassembly: The disagregation of a protein-containing macromolecular complex into its constituent components. IT WILL BE INTERESTING TO SEE IF WE GET DEEPER INFERENCES FOR THESE BASED ON THE MEMBERS OF THE COMPLEX.

goodb commented 5 years ago

Note from call on Nov. 13 2019. Though we would like more ontology and curator consideration, the short term plan is to go forward with the establishment of localization and protein-containing complex disassembly proposals as above.

goodb commented 5 years ago

@ukemi for the dissociation inference, would you always expect different locations for the inputs and outputs, as we see in the example here? Would it be possible to have protein complex disassembly happening e.g. entirely in the cytoplasm?

One formulation for this rule would be: If: there are more outputs than inputs and the outputs are all parts of the inputs and the inputs are protein complexes then infer that the reaction is 'protein complex disassembly'

Another would add on another constraint: and the outputs are in a different location from the inputs.

ukemi commented 5 years ago

I think biologically it is certainly be possible to have the dissociation happen entirely in one cellular location. @deustp01 can you provide an example?

goodb commented 5 years ago

@ukemi Here is an example I think, 'Beta-catenin is released from the destruction complex' (from our other active problem example!) https://reactome.org/PathwayBrowser/#/R-HSA-201685

One big complex becomes lots of smaller pieces, all in the cytosol. Fair to type this reaction as 'protein-containing complex disassembly' ?

ukemi commented 5 years ago

I think certainly yes, even the Reactome description indicates this.

goodb commented 5 years ago

Running my current rule (which is just simply (a) the reaction is not typed and (b) there are more outputs than inputs), on TCF dependent signaling in response to WNT and Signaling by BMP, it would call the following reactions 'protein-containing complex disassembly':

Signaling by BMP

Phospho-R-Smad1/5/8 dissociates from the receptor complex

Deactivation of the beta-catenin transactivating complex

APC promotes disassembly of beta-catenin transactivation complex XIAP dissociates from ub-TLE

Disassembly of the destruction complex and recruitment of AXIN to the membrane

Beta-catenin is released from the destruction complex

Formation of the beta-catenin:TCF transactivating complex

Beta-catenin displaces TLE:HDAC1 from TCF/LEF

Any concerns ?

goodb commented 5 years ago

The last one there 'Beta-catenin displaces TLE:HDAC1 from TCF/LEF' seems like it might not be a great fit as it isn't so much disassembly as it is restructuring.

It would probably fit 'protein-containing complex remodeling' better. Which seems a bit trickier to put into a rule.

ukemi commented 5 years ago

These all seem ok to me except the last one which is both a disassembly and an assembly.

ukemi commented 5 years ago

It is a dissociation TCF/LEF:TLE:HDAC1 input gets the HDAC1 split off. It is also a formation CTNNB1 gets input and TCF/LEF:CTNNB1 gets output. I'd hesitate to call it a remodeling because it is a separate, distinct complex with a different function.

ukemi commented 5 years ago

The TLE gets output as well.

ukemi commented 5 years ago

I think this (handwaving):The acquisition, loss, or modification of macromolecules within a complex, resulting in the alteration of an existing complex. means that the complex retains its identity and would be a member of the same complex class, but that's not really specified in the definition. We should look and see how this term is used in annotation. In this case, I think the output complex would fall into a different class.

goodb commented 5 years ago

I think we may see lots of modifications in the midst of what might otherwise look like disassemblies. e.g. in APC promotes disassembly of beta-catenin transactivation complex, a BTRC gets added onto the complex while other parts are stripped away.

Also, many (most?) reactions that might be typed with disassembly are only partial disassemblies. There is usually some part of the input complex that remains attached. e.g. in Beta-catenin is released from the destruction complex, most of the complex is turned into its parts, but GSK3B:AXIN1 remains. Should these be typed as disassemblies? How far down into its components does the complex need to be reduced to match?

It seems that we need some more logic for the definitions of these classes.

ukemi commented 5 years ago

I think partial disassembly is ok.

ukemi commented 5 years ago

It seems that in the causal GO-CAM world, if a complex is reduced and it is no longer capable of executing a function or it will execute a different function then it has been disassembled. The converse would also be true for assembly. Does that make sense? I'm just throwing out ideas, but I think this is getting back to the point of a requirement for additional curation to try to describe better the biological context of these assembly (binding) and disassembly events.

goodb commented 4 years ago

Updated examples of dissociation ('protein-containing complex disassembly') and transport ('establishment of localization') can be found in http://noctua-dev.berkeleybop.org/editor/graph/gomodel:R-HSA-201451

Screen Shot 2019-12-01 at 9 56 13 PM Screen Shot 2019-12-01 at 9 54 24 PM

deustp01 commented 4 years ago

I think biologically it is certainly be possible to have the dissociation happen entirely in one cellular location. @deustp01 can you provide an example?

Textbook normal processes: assembly / disassembly of actin microfilaments and of microtubules (GO:0046785 microtubule polymerization, GO:0007019 microtubule depolymerization). More elaborate: assembly of ribosomal large and small subunits onto a mRNA to form a mature ribosome and initiate translation: disassembly of same after termination of translation. (Ribosomes are organelles in GO but their large and small subunits are protein complexes, so the complex : organelle boundary here seems arbitrary.

Also GO:0031035 myosin filament disassembly

deustp01 commented 4 years ago

Disassembly versus remodeling: remodeling looks like it should almost always be shorthand notation for a series of events in which pieces come off a complex and other pieces join, with causal relationships among the events, e.g., loss of piece "A" unmasks a binding site for incoming piece "B", which in turn creates a synthetic site that enables binding of piece "C". Sometimes all we know is the overview: start_complex + B + C -> end_complex + A. Sometimes we have the data and motivation to annotate all of the intervening steps. Sometimes the individual events are truly sequential, as in this hypothetical example; sometimes an event is truly a displacement of an outgoing piece by an incoming one - such an irreducible event would be a true remodeling.

There are also cases where spelling out all the sequential microevents that form the remodeling of a complex adds no information at the level of pathway annotation. A "remodeling" of convenience?

goodb commented 4 years ago

Heads up @ukemi @vanaukenk that @thomaspd has a different idea about how to model dissociation. Something to do with negative regulation of binding. Please confer.

ukemi commented 4 years ago

I think we had considered this and it seemed reasonable at the time. @thomaspd, can you specify its implementation so @goodb can make it happen?

ukemi commented 4 years ago

The other aspect of this ticket that we talked about at breakfast was whether we should use transport rather than 'establisment of localization'. I think @goodb was correct that transport might be better since in GO localization encompasses the idea that an entity is in one location and not another. I'm don't think this is universally correct for what's happening in the Reactome models. I think there are many cases where the entity is just moving but not localized. Initially we thought we couldn't use transport because there was no transporter. I think this caveat needs to be revisited. @vanaukenk @deustp01 do you have any thoughts on this?

deustp01 commented 4 years ago

Here's a use case. RNA is synthesized within the nucleus with nucleoside triphosphates (NTPs) that are synthesized in the cytosol and, as far as I know, simply diffuse down a concentration gradient into the nucleus through the nuclear pore which here "functions" simply as a hole whose diameter is large compared to the diameter of an NTP. To get complete connectivity in Reactome (not done yet, so no current GO-CAM models are affected one way or the other), we'd need to create a reaction in which input cytosolic NTP is converted into output nucleoplasmic NTP with no physical entity as enabler. Meanwhile, lots of NTP remains in the cytosol. No problem for Reactome, but if it's needed for GO-CAM and "localization" seems wrong per @ukemi then we're left allowing transport (diffusion) without a transporter physical entity.

That's the hard case - transport really does happen without a transporter. There are lots of edge cases where it's a reasonable more or less indirect inference that a transporter protein or complex mediates the transport of an entity from [here] to [there] but there's not enough evidence to associate specific proteins with the specific transport activity. Movement of many entities into and out of the peroxisome falls into this category.

ukemi commented 4 years ago

We could hand-wave here and say that even though the nuclear pore is not a catalyst, it enables the transport. This seems analagous although on a different scale to any kind of pore or channel. I think the above case fits our definition: The directed movement of substances (such as macromolecules, small molecules, ions) or cellular components (such as complexes and organelles) into, out of or within a cell, or between cells, or within a multicellular organism by means of some agent such as a transporter, pore or motor protein. Can you think of one in Reactome that doesn't? @thomaspd I think you were thinking along these lines at one point as well.

deustp01 commented 4 years ago

No, the GO and Reactome views of transport align in all cases I can think of.

thomaspd commented 4 years ago

Commenting first on the dissociation process. The example Ben pointed me to is https://reactome.org/content/detail/R-HSA-201453.

If we look at the upstream reaction (phosphorylation of SMAD1/5/8), it's clear that it should appear in GO-CAM as [(protein kinase activity enabledBy p-BMPR) hasInput SMAD1/5/8)]: because p-BMPR is annotated as the active subunit; and we can deduce that SMAD1/5/8 is the target because SMAD1/5/8 (reactant) becomes p2S-SMAD1/5/8 (product).

The following dissociation reaction (R-HSA-201453) should refer to some activity of SMAD1/5/8.

At the NYU meeting, I was proposing that a dissociation should be modeled as a binding function that is negatively regulated by the step that is upstream of it. So, in this example, the activity of BMPR (above) directlyNegativelyRegulates [(protein-containing complex binding enabledBy SMAD1/5/8) hasInput BMP:p-BMPR:Endofin]. This latter activity, in turn, directlyNegativelyRegulates the next (third) step, [(protein binding enabledBy p2S-SMAD1/5/8) has_input SMAD4. Note that the dead-end reaction involving I-SMAD would be represented as I-SMAD enabling a binding that directlyNegativelyRegulates that third step as well.

thomaspd commented 4 years ago

In response to David's comment about translocation, if there's a process carried out by some generic machinery like the nuclear pore, transcription, translation, etc., we can just include an intervening "activity regulating process". Like David, I think this is OK even if it's the activity of a channel that enables passive diffusion like the nuclear pore (relative to small molecules at least)-- TCDB still calls these transporters. I think this is covered by the process of "nuclear transport" in the GO. So we'd have ACTIVITY1 ---causallyUpstreamOf--> nuclear transport ---causallyUpstreamOf--> ACTIVITY2

goodb commented 4 years ago

Thanks @thomaspd - regarding your suggestion for dissociation, I've put together this (editable) picture of my understanding of your approach here and how it relates to what we have now. (I need pictures to think about these things.)

Could you and perhaps @ukemi and @deustp01 have a look at that to see if I have your idea right?
Screen Shot 2019-12-16 at 8 48 39 AM

thomaspd commented 4 years ago

Thanks Ben, this looks right. Just a small edit (which I assume is just correcting a typo): in the first reaction, the GO term should be "transmembrane receptor protein serine/threonine kinase activity", rather than "phosphorylation of SMAD1/5/8".

goodb commented 4 years ago

Okay, yes "transmembrane receptor protein serine/threonine kinase activity" for the GO type in the first reaction - corrected in the editable view.

The second reaction is interesting here. I see the logic given the full context provided by the pathway, but if you tried to look at that reaction independently, it would be hard to see why it was typed as 'protein-complex binding' as the reaction is really doing the opposite.

Curious what @ukemi thinks.

goodb commented 4 years ago

Coming back to the transport question above (should have made this two issues). My reading here is that we are good with making the change from 'establishment of localization' to 'transport'. The rule would be: If the reaction has the same molecule as an input and an output but the molecule has different locations, then the GO-CAM version is an instance of Transport with has_target_start_location and has_target_end_location properties defined.

If the entity in question is a protein, then it would get Protein Transport, otherwise (for complexes) just Transport.

When we had this implemented before (we lost it about last february I think), we gained a lot of deepening of these types from the reasoner for Protein Transport. e.g. if we called the reaction in question from bmp 'protein transport', it would be inferred to be 'protein localization to the nucleoplasm', which seems to match the intent of the reaction pretty well.
Screen Shot 2019-12-16 at 10 52 56 AM

However, that wouldn't happen because the reaction is moving a complex unless we also add another relation 'transports or maintains localization of' between the reaction instance and the entity being transported. That actually results in the same inference as above.

Screen Shot 2019-12-16 at 11 27 00 AM

ukemi commented 4 years ago

For the binding example, I think we need to see if we are happy with any annotations that fall out of it, but if this is the vision of how to model these types of events, then we should be consistent. In this case I think we would get an annotation to any of the Smads enabling protein binding. I agree, the reaction is kind of doing the opposite. It's unbinding for lack of a better word.

For the second example, this makes sense to me, but then wouldn't it be possible to digest it further to just 'transports or maintains localization of'. I think in a model that was being curated, we would simply use the more complex relation. Input and output standing alone become insufficient as soon as we start talking about coupled transport. This is because something like ATP is also an input to the reaction and ADP is an output of the reaction. You're rule is more sophisticated because it ensures that it is the same thing input and output and ensure a different start and end location.

I just looked at this relation in RO and I think it could use some work. There is no connection between transports or maintains... and has_input and has_output, but couldn't one consider the transported entity to be a type of input or output? In fact, the way Reactome captures it is has an input of something{here} and an output of samething{there}. At the end of the day, I think going back to transport is the right thing to do. @deustp01, @vanaukenk ?

For reasons we discussed in NYC I'm not entirely happy with the localization inference. But that is another battle for another day.

goodb commented 4 years ago

For the second example, this makes sense to me, but then wouldn't it be possible to digest it further to just 'transports or maintains localization of'.

Not sure what you mean. If we want the ability to infer transport or localization type then we also need the has_target_location relations. What else were you thinking of taking out?

goodb commented 4 years ago

For the binding example, I think we need to see if we are happy with any annotations that fall out of it, but if this is the vision of how to model these types of events, then we should be consistent. In this case I think we would get an annotation to any of the Smads enabling protein binding. I agree, the reaction is kind of doing the opposite. It's unbinding for lack of a better word.

I made a temporary example (alive only till noctua-dev resets).

Screen Shot 2019-12-17 at 10 29 25 AM

SMAD1 molecular function regulator has_input(GO:0032991),directly_negatively_regulates(GO:0005515)
SMAD1 protein-containing complex binding has_input(GO:0032991),directly_negatively_regulates(GO:0005515)
SMAD1 protein-containing complex binding has_input(GO:0032991),directly_negatively_regulates(GO:0005515)
SMAD1 molecular function regulator has_input(GO:0032991),directly_negatively_regulates(GO:0005515)
     
     
BMPR1A transmembrane receptor protein serine/threonine kinase activity positively_regulates(GO:0005515),has_input(UniProtKB:Q15797),has_input(PR:000000001),directly_negatively_regulates(GO:0044877),directly_negatively_regulates(GO:0098772)
BMPR1A molecular function regulator positively_regulates(GO:0005515),has_input(UniProtKB:Q15797),has_input(PR:000000001),directly_negatively_regulates(GO:0044877),directly_negatively_regulates(GO:0098772)
BMPR1A molecular function regulator positively_regulates(GO:0005515),has_input(UniProtKB:Q15797),has_input(PR:000000001),directly_negatively_regulates(GO:0044877),directly_negatively_regulates(GO:0098772)
BMPR1A transmembrane receptor protein serine/threonine kinase activity positively_regulates(GO:0005515),has_input(UniProtKB:Q15797),has_input(PR:000000001),directly_negatively_regulates(GO:0044877),directly_negatively_regulates(GO:0098772)
ukemi commented 4 years ago

Sorry @goodb. The second example is working exactly as it should given the structure of the ontologies. My disagreement is with the structure of the ontology itself, an argument for another day. I suspect it's more an peculiarity of having spent about 5 years of my life working on localization. I don't think transport always results in localization but as long as the ontology and the properties are the way the currently are, this inference is 'correct' and my narrow view of what localization is can be dismissed.

ukemi commented 4 years ago

Are we happy with these annotations, particularly the first one? There is a bit glossed over here in that if we went to the level of proteoforms, it would say that the phospho-SMAD1 enables a protein binding with a single protein complex input. ping @thomaspd, @deustp01, @vanaukenk

deustp01 commented 4 years ago

On the binding / dissociation sub-thread

In this case I think we would get an annotation to any of the Smads enabling protein binding. I agree, the reaction is kind of doing the opposite. It's unbinding for lack of a better word.

Would the classical enzymology convention that all reactions are reversible to some (possibly very small) extent with pyruvate kinase as a good example, help us to rationalize and explain Paul's proposed fix? I.e., dissociation isn't really a distinct kind of event, but just a binding event run in the right-to-left direction?

deustp01 commented 4 years ago

On the transport sub-thread

I don't think transport always results in localization

@ukemi, could you give an example. At least as it's handled in Reactome, transport has to start with an entity located [here] and end with that entity located [there], so a narrow focus on just the transport activity yields a localization. Of course, as soon as the focus broadens, the ultimate outcome could be different, e.g., entities tightly clustered at a precisely defined [here] end up in a [there] like extracellular space and diffuse away.

deustp01 commented 4 years ago

More binding / dissociation

Are we happy with these annotations, particularly the first one? There is a bit glossed over here in that if we went to the level of proteoforms, it would say that the phospho-SMAD1 enables a protein binding with a single protein complex input.

Not sure about the issue here. phospho-SMAD, not plain SMAD, is the enabler but that information loss is due to a temporary limitation of GO-CAM expressivity to be fixed in collaboration with PRO, right, not a deeper logical limitation of GO-CAM? Or am I missing the point?

goodb commented 4 years ago

I think the main issue comes back down to the assignment of the 'protein-containing complex binding' class to the reaction instance 'Phospho-R-Smad1/5/8 dissociates from the receptor complex'. Screen Shot 2019-12-17 at 3 10 24 PM

That can't be understood without the negative regulation relationship coming from the upstream reaction. This results in what I think is an incorrect gpad annotation that anything enabling that dissociation reaction is actually enabling protein-complex binding.

I think Peter's explanation of dissociation as the reverse of binding aligns with what Paul was saying. Regardless, I don't think this works as it stands. Given our current reasoning system and approach to this conversion process, I think the conversion of a reaction into a function (or process) node in a GO-CAM should be logical even when viewed in isolation from the other nodes in the model.

It seems to me we need a class that describes a complex breaking apart. If that isn't 'protein-containing complex disassembly', what would be a better fit?

I am also really not sure how I would infer those molecules as the enablers for the second two proposed binding reactions in that chain.

ukemi commented 4 years ago

@deustp01 You gave an example of my localization issue in your comment 12 days ago. In my view localization means that a population of things is in one place and not in any other place. Things can be transported without the result of being localized and things can be localized without being transported. For example selectively stabilizing something in one location and degrading it everywhere else would result in its localization.

ukemi commented 4 years ago

Not sure about the issue here. phospho-SMAD, not plain SMAD, is the enabler but that information loss is due to a temporary limitation of GO-CAM expressivity to be fixed in collaboration with PRO, right, not a deeper logical limitation of GO-CAM? Or am I missing the point?

But is the phospho-Smad enabling the binding or the dissociation? Again, probably my old-fashioned view that has crept into GO but to me the process of binding is the association of the things coming together. Once they are together they are bound and the process of them coming apart is dissociation.

deustp01 commented 4 years ago

binding sub-thread

@ukemi we could work this into a process / pathway boundaries issue but your view works as-is and especially as we also believe in the power of GO-CAM for combinatorial annotation no fix is needed in order to say what we need and make the distinctions we need.

deustp01 commented 4 years ago

dissociation sub-thread

@ukemi and I discussed this off-line. Here's a summary.

Thinking about the process as biologists, we had trouble understanding why "binds" and synonyms are acceptable ontology terms while "un-binds" and synonyms are not, and we expect that biologists trying to read annotations in this area (as opposed, perhaps, to computational reasoning tools) will find the annotations hard to understand, maybe misleading.

At the same time, we agreed that Paul's binding and negative regulation of binding formulation allows all of the information associated with a dissociation event to be captured.

goodb commented 4 years ago

@ukemi @deustp01 can you try one more summary for me here? I am not sure where we stand with the dissociation reaction model from what you say above.

deustp01 commented 4 years ago

At the NYU meeting, I was proposing that a dissociation should be modeled as a binding function that is negatively regulated by the step that is upstream of it. So, in this example, the activity of BMPR (above) directlyNegativelyRegulates [(protein-containing complex binding enabledBy SMAD1/5/8) hasInput BMP:p-BMPR:Endofin]. This latter activity, in turn, directlyNegativelyRegulates the next (third) step, [(protein binding enabledBy p2S-SMAD1/5/8) has_input SMAD4. Note that the dead-end reaction involving I-SMAD would be represented as I-SMAD enabling a binding that directlyNegativelyRegulates that third step as well.

I'm saying that we find this model counterintuitive (and would therefore prefer to avoid it if possible), but as far as we can tell it captures all of the aspects of the activity that we think of as "dissociation".

goodb commented 4 years ago

@deustp01 @thomaspd @ukemi (adding @cmungall ). All I want for Christmas is this issue to be resolved.

The issue became complicated when the suggested typing of dissociation reactions as 'disassembly' processes was rejected. Could someone clarify for me why that decision was made?

After studying it in some depth, I do not think Paul's proposal is workable in the context of the automated conversion process. The rules that would produce this structure aren't clear to me - especially the inference of the enabling molecules. Further, I think assigning the type 'binding' to a reaction whose result is 'unbinding' is a non-starter for reasons described above.

I think we need to fall back to a simpler view here that aligns with all of the other modeling decisions that have been made to date for the conversion. If 'protein-containing complex disassembly' is not right, I would rather just leave the reaction as a raw 'molecular function' until we can come up with a more adequate term from the ontology. Given the GO-CAM structure, we still have all of the raw material there in the model to make other kinds of inferences in downstream software processes (e.g. export or display algorithms could detect a special event based on it having one input and two outputs and do as they wanted with that information.)

?

deustp01 commented 4 years ago

@goodb @thomaspd @ukemi @cmungall As above, I don't see why "binding" is a good fit to the GO molecular function ontology and "un-binding" is not, but that is a biologist view and definitely arrived at without much knowledge of the history of work in this area or of the logical constraints imposed by the ontology structure and reasoning tools. (And as we'll see at the bottom of this post, Reactome certainly brings some major logical constraints of its own to the process.)

Also as above, I was assuming that Reactome un-binding reactions could be parsed cleanly and reliably into "negative regulation" ones, and willing to accept that as a Path2GO version 1 compromise. If clean-and-reliable is not available, that's a problem.

The two top priorities for me right now for Path2GO version 1 are to move the biggest possible range of Reactome pathways and their reaction parts into GO-CAM models and to do it without information loss. Un-binding is a very big deal biologically and not being able to capture it systematically is a problem - as far as I can tell that would regularly break processes into multiple, un-connectable reaction clusters.

If for logical / technical reasons un-binding is toxic for now, then plain "molecular function" sounds like a better temporary label to get these reactions into GO-CAM - uninformative instead of (to a many biologists, at least) misleading. And looking for ways to detoxify it in the future seems to me like a high priority.

Going back to the discussions several months ago of BMP signaling, I am beginning to appreciate the differences between a Reactome activity-flow and a GO process-flow view of reaction space, and I can see, exactly as in the BMP case, that explicitly annotating the negative regulatory downstream consequences of binding and unbinding events, adds substantial value / information to the GO-CAM model of a process. I also see that the Reactome data model feature that restricts the number of process annotations that can be attached to a reaction and its participating proteins to one, gets in the way of this, and reasoning at the GO-CAM stage to add these additional GO terms semi-automatically, will be a really useful product of the whole project.

ukemi commented 4 years ago

Is there a good argument against our original proposal to use complex disassembly?

deustp01 commented 4 years ago

This is exactly the biologist versus ontologist/developer issue. I don't see the problem with that proposal.

goodb commented 4 years ago

In order for the conversion to validate against the shex schema, which does not allow MF-regulates-BP relations, we need to adapt the representation of transport and disassembly processes (or any BP nodes).

Currently, we end up seeing e.g. MF1 regulates Transport which is not allowed because Transport is a BP. Here is an example (bio totally made up). On the top is the current structure, below a proposal that would follow the shex schema:

Screen Shot 2020-01-08 at 12 52 00 PM

Thoughts @ukemi ? (you can edit the demo model there if you like. http://noctua.geneontology.org/editor/graph/gomodel:59dc728000000510 )