biolink / biolink-model

Schema and generated objects for biolink data model and upper ontology
https://biolink.github.io/biolink-model/
Other
176 stars 72 forks source link

Modeling of reaction participants #478

Closed cmungall closed 3 years ago

cmungall commented 4 years ago

Create a new association class reaction to participant association

The fields would be:

TBD:

cbizon commented 3 years ago

I wonder if it would be possible to bump this issue up a bit. If we want to focus on errors of metabolism, having some shared way to handle reactions will be very helpful.

nlharris commented 3 years ago

Would what @cmungall described meet your needs, @cbizon?

cbizon commented 3 years ago

Possibly, but I would want to get opinions from @baranzini-lab who also has data of this type to model. I think it would also be good to work through an example of reactions from the main sources, such as KEGG, reactome and rhea.

baranzini-lab commented 3 years ago

Happy to share what we have done (and doing). I can show how we have modeled reactions, and the successes and frustrations we had along the way. This may ultimately save some time for you, and your input will help us solve some ongoing challenges

sierra-moxon commented 3 years ago

From mini-hackathon 3/18/2021 (my notes, Dr. Baranzini will post a PR as well):

Dr. Baranzini presented his model for this. enzymatic activity (shortcut node that groups genes/proteins) ?--? reaction (MolecularActivity node) -[consumes]-> chemicalCompound (node) -[produces]-> chemicalCompound (node)

Their group is getting close to removing 'enzymatic activity' as a node and replacing it with Gene.
I did not catch which predicate you would use between enzymatic activity and reaction, @baranzini-lab (maybe 'catalyzes')?

Challenges: compounds are not well annotated (ie: which glucose to choose, a, b, both - other groups are working on this).

Some reactions may not utilize enzymes (in practice, in the human lots use enzymes). MolecularActivity might not be defined broadly enough to be used as "reaction" right now in the model.

colleenXu commented 3 years ago

My view: what functionality is added by having a Reaction entity class? Perhaps the way the info is structured depends on how it would be used / how to query for it.

  1. Currently, aren't people doing direct ChemicalSubstance <-> Gene queries? Is a Reaction node intermediate needed? If the reaction is more of a way of describing how these two things are linked, then isn't reaction more like an edge property (rather than an intermediate node)?

  2. Having a Reaction entity class may mean dealing with reactions vs pathways. And that sounds potentially messy?

    • is there any synonymization (so we know this reactome pathway/reaction is the same as this smpdb one)?
    • some KPs may only ingest the pathway-level membership and not the reaction level
    • KPs may not have all of the players related to a reaction (including every possible inducer / inhibitor and what they affect)
    • what does the reaction's location in a pathway mean (downstream/upstream)? Is that even a concrete thing? Since there's feedback loops, complex behavior...it's hard to say that one chemical only does one thing in one step in one pathway ONLY.
sierra-moxon commented 3 years ago

more info: Molecular Activity has a synonym "reaction" in the model.

baranzini-lab commented 3 years ago

@colleenXu, metabolic pathways are a special case of pathways, and modeling them can be tricky. Some people use a metabolite-centric graph with the metabolites as nodes and reactions as edges. Others, use a reaction-centric graph where two reactions are connected by at least one arc representing a substrate or product metabolite. A third approach (https://www.pnas.org/content/105/36/13223), which we adopted, is to describe metabolic networks is a bipartite graph with two types of nodes representing metabolites and reactions. We liked this approach as it naturally allows for modeling metabolic flux, and it naturally allows for reversibility, which many reactions show. This said, this is a decision each KP can make based on their own expertise and interests.

karthiksoman commented 3 years ago

Continuing from Sergio's comment on metabolic networks in SPOKE: When it comes to modelling "Reaction" nodes in SPOKE, we use "KEGG" and "Metacyc" ids to define them. The closest mapping that we found in Biolink, corresponding to this is "biolink:MolecularActivity". However, we see that this Biolink entity also maps to "EC" nodes, which is a different node type in SPOKE. I would be happy to discuss more about this and it would be great if @sierra-moxon could advise me regarding who will be the point of contact from the Biolink group to discuss about SPOKE model and making it Biolink compatible with regard to the metabolic network.

cbizon commented 3 years ago

couple of quick notes: In @cmungall 's version at the top, how does one relate a gene/protein to the activity?

@colleenXu if we have reaction nodes (or use MolecularActivities to do roughly the same thing) then it doesn't preclude also making direct gene/chemical edges in a graph if those are useful.

cmungall commented 3 years ago

how does one relate a gene/protein to the activity?

The catalyst gene/protein/ncRNA/complex would also be a reaction to participant association, enabled by

(of course, proteins can also participate as substrates too)

cmungall commented 3 years ago

@karthiksoman

I would treat EC IDs as the same as MetaCyc and KEGG reactions, Rhea reactions, etc. This is how we are doing it in GO

You can make a linguistic argument that enzymes are proteins hence EC IDs denote biolink:Proteins, but I think this causes lots of problems. I'm interested in the decisions that led you to treat ECs differently.

Perhaps we should discuss this next call?

cmungall commented 3 years ago

@baranzini-lab good points

I think we can support all kinds of graphs with formal translations between each

baranzini-lab commented 3 years ago

This is how we are doing it in GO

@cmungall I would be interested in discussing further your thoughts on how metabolic reactions ARE similar and how they are NOT similar to GO.

In our case, we decided to introduce the concept of "enzymatic activity" (represented by the E.C. number) as a way to integrate several enzymes that can catalyze the same reaction from different species.

karthiksoman commented 3 years ago

@cmungall Currently, in SPOKE we have EC and reaction node types as separate entities for the reason that Sergio mentioned here. In that situation, if we map these two node types to the same biolink entity (biolink:MolecularActivity), wouldn't that introduce a many to one mapping. For example, if a TRAPI query has a node "biolink:MolecularActivity", to which SPOKE node type will we map this to, Reaction or EC?
Sure, I would be happy to discuss this during the modelling call.