geneontology / go-shapes

Schema for Gene Ontology Causal Activity Models defined using RDF Shapes
2 stars 0 forks source link

biological process part_of cardinality 1 #238

Closed goodb closed 3 years ago

goodb commented 3 years ago

A recent change to the pathways2go conversion ended up producing some models that have multiple BP part_of BP links. (See pic below). These links reflect the reactome pathway part_of hierarchy. The model is failing to validate because we have a cardinality restriction on the BP shape limiting it to 0 or 1 part_of BP assertions. I think this should be relaxed to 0 or more. Thoughts @ukemi @vanaukenk ?

Screen Shot 2020-09-29 at 10 13 23 AM
vanaukenk commented 3 years ago

@thomaspd - what are your thoughts? We can see this being useful for linking modules, although we'd probably eventually want individual models for each of the different pathways of which the module is a part. Also, sometimes we have specific evidence for the 'part of' relationship, and this would be the appropriate place to capture that.

thomaspd commented 3 years ago

For GO-CAM, my original idea was to define modules that could interconnect via activity-activity causal edges between models, which I know Ben is working on. For broadly reused modules, though, I don't know if we should generally consider the module to be an actual PART of another process, as opposed to REQUIRED FOR it (e.g. the general transcription module is required for any transcription factor-based regulation in a pathway), or DOWNSTREAM OF it (e.g. in the example above, MAP kinase activation could be considered to be downstream of the toll-receptor signaling, as shown in the diagram from Reactome: https://reactome.org/PathwayBrowser/#/R-HSA-975871&PATH=R-HSA-168256,R-HSA-168249,R-HSA-168898,R-HSA-168142

Maybe we should discuss on our next call?

pgaudet commented 3 years ago

I am not sure I understand the model, it looks like (from my relatively external point of view) that we connect unrelated processes to the same MF - that doesn't look quite right to me.

deustp01 commented 3 years ago

Here's how the Reactome curator annotated the causal link between the two processes. The last reaction in the pathway "IRAK2 mediated activation of TAK1 complex" (R-HSA-937042) is "Auto phosphorylation of TAK1 bound to p-IRAK2:pUb oligo-TRAF6: free K63 pUb:..." (R-HSA-936991). One of the products of this reaction, activated TAK1 complex, catalyzes the first reaction of the pathway "MAP kinase activation" (R-HSA-450337), "Activated TAK1 phosphorylates MKK4/MKK7" (R-HSA-450337). On the basis of this connection the Reactome curator has asserted that reaction R-HSA-936991 is a preceding event of reaction R-HSA-450337, and has shown this relationship graphically in the pathway diagram associated with the "IRAK2 mediated activation of TAK1 complex" (R-HSA-937042) pathway (the one that Paul's comment points to).

thomaspd commented 3 years ago

If I understand Peter correctly, Reactome doesn't consider the "MAP kinase activation" pathway to be part of those other processes (e.g. "IRAK2 mediated activation of TAK1 complex"), but downstream with connected reactions, which is consistent with how I'd wanted to define modules in GO-CAM. If that's right, we don't want to assert part of relations here.

deustp01 commented 3 years ago

Reactome doesn't consider the "MAP kinase activation" pathway to be part of those other processes (e.g. "IRAK2 mediated activation of TAK1 complex")

Right. In a case like this, where either multiple upstream event sequences feed into a common downstream process (or, more rarely, when a single upstream process can feed into any of two or more downstream ones), we mostly treat each of those event modules as a separate pathway. Our goal is to minimize the number of cases where a given event is part_of more than one process.

This is one of many ways in which process / pathway boundaries can be adjusted ad hoc to fit the preferences of an annotation strategy although, in this, I think the way we drew the boundaries is compatible with conventional views of pathway or subpathway boundaries in this signaling process.

So, yes, we exactly are trying to avoid asserting broad part_of relationships here, but do want causally_upstream_of - in this case, provides_direct_input_for or positively_regulates, depending how one cares to classify activation of a required catalyst.

goodb commented 3 years ago

Here, we are specifically asking the question of whether or not, for any possible GO-CAM model (generated from Reactome or not) it should be allowed to have more than one part of relationship linking a BP node to other BP nodes.

I think the example I gave may have been more of a Reactome-specific edge case that is distracting us. In our discussion of the issue on Tuesday, both curator leads in attendance @vanaukenk and @sabrinatoro indicated that sometimes it would be useful in their work to create more than one BP-BP assertion (per BP) in a model. Perhaps they could offer some examples from their work to illustrate.


Regarding Reactome. The "MAP kinase activation" pathway isn't declared anywhere to be a part of "IRAK2 mediated activation of TAK1 complex" - not sure where that idea came from. They are sibling parts of e.g. 'MyD88 cascade initiated on plasma membrane' and probably others. The part_of assertions that appear in that model are simple, direct translations of the Reactome pathway polyhierarchy. FWIW there only 34 pathways in our current conversion with multiple direct part_of parents. Also, to be clear, all of the causal reaction-reaction connections that @deustp01 describes are present as activity-activity connections in the converted models - we aren't choosing one or the other kind of assertion, we have both.

thomaspd commented 3 years ago

I understand what you mean about the general question, and I think focusing on specific examples is exactly what we need to do when discussing the rationale for extending the ShEx at this point. In this case the specific need for extending it now was to capture the "Reactome pathway polyhierarchy", but based on Peter's response, it seems that we don't actually want to represent those polyhierarchy relationships in GO-CAM as "part of" relations between pathways/BPs. If that's right, we don't have a specific need to change the spec for the polyhierarchy relationships. But if we have specific examples from other curators, then this would be a great discussion topic for our GO-CAM modeling calls run by @vanaukenk.

deustp01 commented 3 years ago

based on Peter's response, it seems that we don't actually want to represent those polyhierarchy relationships in GO-CAM as "part of" relations between pathways/BPs.

Exactly! BP part_of BP relationships mostly arise because we create superpathways to group similar processes (so, BMP signling part_of signaling).

goodb commented 3 years ago

(Ticket hijack here - please ignore for shex schema discussion).

@deustp01 I'm a little confused. In previous conversations you said specifically that these relationships were by and large part_of relations. https://github.com/geneontology/pathways2GO/issues/104#issuecomment-692878464 . From the BioPAX model perspective, the relations there are pathwayComponentOf - the same relation used to link Pathways to reactions. All of these suggest to me that we are generally looking at part_of relations between pathways and subpathways. e.g. 'activated TAK1 mediates p38 MAPK activation' is a part of 'map kinase activation' and many many others like that.

I understand when you say that some of these relations are used for grouping - e.g. 'Signal Transduction' isn't really composed of its subpathways it is a collection of them. I think the conflation of grouping for visualization versus biological knowledge representation is a weakness on the Reactome side (that might be addressed using the GO at some point). But I think the pathway-pathway relations are too valuable to eliminate completely because they make queries like "what are the genes involved in the Notch Signaling pathway" (or any non-leaf pathway) possible. Without those pathway part_of pathway links, the knowledge graph we are building here is a lot less useful.

If part_of semantics really breaks something, we could use another relation like skos:narrowerThan

deustp01 commented 3 years ago

Sorry - I was thinking about the relationship between a pathway and its component reactions, and conveniently forgot the legal use of broad pathways ("signaling", "metabolism", ...) to group more specific ones. Reactions reliably are parts_of pathways. I guess we likewise say that specific pathways are parts_of grouping pathways, but that seems wrong. They are more nearly instances, so is_a: BMP signaling is_a signaling pathway versus "BMP ligand binds BMP receptor" reaction part_of BMP signaling pathway.

Reactome currently does not make this distinction. One of the attributes of the event class (reaction or pathway) is hasEvent, and that attribute lumps together is_a and part_of relationships. It gets particularly messy because a pathway is permitted to have is_a children. A quick made-up example: biochemists sometimes divide glycolysis into a priming phase (the first few reactions of this pathway) and an energy generating phase (the rest). If we followed that usage, glycolysis pathway would have two part_of children, priming and energy generating.

A fix would be to split the attribute into two, one to capture the is_a children (grouping), and one to capture the part_of children (listing the required steps for the individual pathway). We should think some more about whether such a clarification / clean-up would be useful @ukemi

sabrinatoro commented 3 years ago

I made a drawing of what I was thinking on Tuesday when we discussed this. Picture1

Basically, some gene (like BMP4) would be in all the “part of”, others (like smad5) would not.

However, reading the ticket today, and channeling my inner David H, if I were to make these annotations, I would have to separate these “part_of” because it would be different instances of BMP4 (the same BMP4 protein cannot be at the same time acting in the heart and in the renal system). In this case, having multiple “part_of” would be the most convenient for a curator, but (based on Paul and probably David’s comments) not correct from a biological point of view.

goodb commented 3 years ago

OK consensus is to leave the schema alone for now - keep cardinality for BP-BP {0,1}

Will adjust reactome conversion to match.