geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Converting binding reactions in Reactome to causal relationships #82

Closed huaiyumi closed 1 year ago

huaiyumi commented 4 years ago

Recactome pathways are reaction (or process) based. When possible, nearly reactions are represented with binding in Reactome. For example, a catalytic reaction is represented by the binding of the reactant to the enzyme to form a complex, followed by a dissociation of the complex to generate a product and the enzyme. Catalysis edge is there sometimes, but the active unit for the catalytic physical entity may not be specified. Here is an example in FGF pathway: R-HSA-5654575.1.. GO-CAM is a causal model. It uses causal relationships (influences) to describe relationships among functions. In another word, it represents function at a different level. Instead of representing relationships in binding, it extracts the outcome of the binding reactions to represent the causal influences. Therefore, during the conversion of Reactome to GO-CAM, it is crucial that those causal relationships are parsed. It may not be straightforward from the Reactome BioPAX file. Certain curation, and incorporation of the existing GO annotations maybe necessary. Conversion rules can be derived from these curation effort, and may lead to more automated process in the future. Here is an example of a conversion rule. This is part of the BMP pathway conversion. image We have a rule for negative regulation by sequestering. The GO-CAM looks like this: image In Amigo, the genes in the ligand trap complex are annotated to BMP binding. Therefore, the model will be better as below. image This google doc illustrated additional examples of conversions: https://docs.google.com/document/d/1H0vcz6QNjSCxHd3tAENiTJoXY7XZnaTHrIwKCXwkHRw/edit?usp=sharing Some of those examples (such as the ligand-receptor binding and subsequent phosphorylation) appears in many pathways, including the FGFR example mentioned earlier. The following two bears great similarity. image

image

It is also true that not all bindings can be converted like these examples. The MAPK pathway contains a lot of binding reactions that ultimately regulate the activity of the MAPK activity. It is impossible for a computer program to track through a series of bindings to extract the eventual causal relations. We probably have no choice but leave those binding reactions in the converted models. Or human curation is inevitable.

Here is my analogy again. Let's say there is a book. Do we want to literally translate it word by word to another language, or to convert it to a comic book? My understanding is we will use the latter approach.

I know this is not an issue that we can solve right away. I was working on this for the past few weeks. I am struggling on this myself. I just want to put these thoughts here for further discussion.

deustp01 commented 4 years ago

For example, a catalytic reaction is represented by the binding of the reactant to the enzyme to form a complex, followed by a dissociation of the complex to generate a product and the enzyme.

That should not be true. I will look at Huaiyu's examples to see what is going on there - did we make a mistake in annotation, or are we asserting that, unlike a classical biochemical substrate, the entity that gets covalently modified must first bind to a complex, and perhaps persist for a while until some other factor activates the catalytic subunit of the complex. Or we are asserting that a ligand protein binds to a receptor complex, the complex reorganizes, and an active site on the receptor part of the complex acts on some other substrate entirely? If the annotated processes correspond to my two invented examples, then I think there is a real temporal and causal distinction between a binding event and a subsequent catalytic event that is enabled by the binding.

I will look at the real examples some more and come back to this.

huaiyumi commented 4 years ago

I don't think it is a mistake. It is just the way how it is represented in Reactome. Here is another example. image

Ultimately BMPR phosphorylates Smad. Smad is a substrate, but also a member in the complex. I am not worried about it in Reactome. However, when it is converted, we do have to dissect out those bindings/complexes to infer the causal relationships in GO-CAM.

ukemi commented 4 years ago

Thanks @huaiyumi! This is exactly what we are looking for. In your first example, I think we could infer your model if we had complete axioms in the ontology. If we knew that protein binding with an input of BMP2 is a BMP binding, that MF would be inferred. The TGF beta receptor binding is a bit more tricky because of the promiscuity of receptors and what ligands they can bind. I think this would require refinement by curation, but maybe could be inferred by the assertions about the functions and the pathways if those were more completely specified in the ontology.

Given that to make these model 'fully GO-CAM compliant' requires that we resolve the binding issues, I suggest we add these examples to the roadmap I have proposed to create at the NYC meeting.

deustp01 commented 4 years ago

Can we apply the end-point / scope-creep measure here also, to distinguish what we can do now with features that are already fully implemented (and what the associated information loss is) from what we can accomplish in version 2?

ukemi commented 4 years ago

I think yes.

goodb commented 4 years ago

FYI, the sequestration rule has been removed from the conversion as it resulted in too many false positives. See the thread from last summer.

For what its worth, my take here is that we shouldn't be overly focused on eliminating binding-style reactions from the models. I think it is more important to work to ensure that we maintain causal relationships that link everything together. These are where I think many of the interesting inferences will ultimately arise (in terms of the upstream/downstream queries). As an example, if you look at my rendition of David's latest take on non-catalytic regulation, the important thing (as I see it) is that you can make it from the phosphorylation of LRP5/6 event to the beta-catenin release event and show that the former is causally upstream of the latter. If you took the binding reaction out of the middle of that chain, it might make the model look a bit cleaner, but it would lose the active-component information and I don't see how it would make the representation more computationally useful. Note also that, as long as the connectivity is there, it ought to be possible to automatically compress relation chains like these to produce different views on the data - as I believe @cmungall has proposed before).

I guess what I am missing here, is a deep understanding of the specific purpose of the 'cartoon' representation. How will it be used? How is that use inhibited by the presence of these kinds of intermediates?

cmungall commented 4 years ago

To fully answer Ben's questions, we need better roadmaps of how these will be used

  1. by curators
  2. computationally, in query answering or analyses such as enrichment
  3. visually as presentation to GO users

Just having a common understanding that these encompass separate (but sometimes related) use cases will help.

There are active conversations now about presentation (e.g. alliance term pages), and we already have a go-cam site. It's likely that we will have some kind of compaction even for native go-cams. We have proposed entity-centric views but I think that because this loses the GO information is poses the question of why we as the GOC would do this. I think there is a nice intermediate that preserves GO information and is in the spirit of the diagrams in the paper, in which we can also do some dynamic filtering, e.g. transitive reduction of non-informative binding nodes. Crucially, this can live in the UI layer rather than be something we have to do during conversion as part of this project. Maybe in future this will feed into ideas about how to do a more cam-ish conversion, but we can err on the side of information preservation as a first pass.

huaiyumi commented 4 years ago

@goodb Not sure if your reference of 'cartoon' representation is coming from my analogy of 'comic book'. If so, sorry about the confusion. What I tried to say is they are two different forms that can't be translated word by word, but rather be converted. I didn't imply that one is a cartoon while the other is literal.

goodb commented 4 years ago

@huaiyumi I wasn't trying to be condescending towards what I referred to as the 'cartoon' model and yes, I was talking about the same concept you referred to as the 'comic book' version. I am fine with comic books :). The only point, which we all understand, is that we are dealing with two different representations - with the details of the newer representation, GO-CAMs still in flux.

As we work out the finer details of the GO-CAM structures, both for the purposes of the conversion and more generally, I think it is really useful to have good answers to the questions of what we plan to do with them. With those defined (e.g. competency questions or the like) perhaps there are empirical ways we could go about making decisions about representation questions.

ukemi commented 1 year ago

I'm moving this ticket to curator review and it will serve as a starting point for the October face-2-face meeting. The files and documents for the meeting can be found here: https://drive.google.com/drive/folders/1_DLeKpVfpf1j5sxoaK78kGPjFPR7sxXO

Steps so far:

ukemi commented 8 months ago

This is well under way and the results of a pathway by pathway analysis can be found here: https://drive.google.com/drive/folders/1_JWXkF9WK3vkgqHcugbhNFFpxwZ4fvRe

I think this ticket can be closed since the solution here is to add MFs to the binding reactions with appropriate enablers, all on the Reactome side.