Closed cmungall closed 3 years ago
I wonder if it would be possible to bump this issue up a bit. If we want to focus on errors of metabolism, having some shared way to handle reactions will be very helpful.
Would what @cmungall described meet your needs, @cbizon?
Possibly, but I would want to get opinions from @baranzini-lab who also has data of this type to model. I think it would also be good to work through an example of reactions from the main sources, such as KEGG, reactome and rhea.
Happy to share what we have done (and doing). I can show how we have modeled reactions, and the successes and frustrations we had along the way. This may ultimately save some time for you, and your input will help us solve some ongoing challenges
From mini-hackathon 3/18/2021 (my notes, Dr. Baranzini will post a PR as well):
Dr. Baranzini presented his model for this. enzymatic activity (shortcut node that groups genes/proteins) ?--? reaction (MolecularActivity node) -[consumes]-> chemicalCompound (node) -[produces]-> chemicalCompound (node)
Their group is getting close to removing 'enzymatic activity' as a node and replacing it with Gene.
I did not catch which predicate you would use between enzymatic activity and reaction, @baranzini-lab (maybe 'catalyzes')?
Challenges: compounds are not well annotated (ie: which glucose to choose, a, b, both - other groups are working on this).
Some reactions may not utilize enzymes (in practice, in the human lots use enzymes). MolecularActivity might not be defined broadly enough to be used as "reaction" right now in the model.
My view: what functionality is added by having a Reaction entity class? Perhaps the way the info is structured depends on how it would be used / how to query for it.
Currently, aren't people doing direct ChemicalSubstance <-> Gene queries? Is a Reaction node intermediate needed? If the reaction is more of a way of describing how these two things are linked, then isn't reaction more like an edge property (rather than an intermediate node)?
Having a Reaction entity class may mean dealing with reactions vs pathways. And that sounds potentially messy?
more info: Molecular Activity has a synonym "reaction" in the model.
@colleenXu, metabolic pathways are a special case of pathways, and modeling them can be tricky. Some people use a metabolite-centric graph with the metabolites as nodes and reactions as edges. Others, use a reaction-centric graph where two reactions are connected by at least one arc representing a substrate or product metabolite. A third approach (https://www.pnas.org/content/105/36/13223), which we adopted, is to describe metabolic networks is a bipartite graph with two types of nodes representing metabolites and reactions. We liked this approach as it naturally allows for modeling metabolic flux, and it naturally allows for reversibility, which many reactions show. This said, this is a decision each KP can make based on their own expertise and interests.
Continuing from Sergio's comment on metabolic networks in SPOKE: When it comes to modelling "Reaction" nodes in SPOKE, we use "KEGG" and "Metacyc" ids to define them. The closest mapping that we found in Biolink, corresponding to this is "biolink:MolecularActivity". However, we see that this Biolink entity also maps to "EC" nodes, which is a different node type in SPOKE. I would be happy to discuss more about this and it would be great if @sierra-moxon could advise me regarding who will be the point of contact from the Biolink group to discuss about SPOKE model and making it Biolink compatible with regard to the metabolic network.
couple of quick notes: In @cmungall 's version at the top, how does one relate a gene/protein to the activity?
@colleenXu if we have reaction nodes (or use MolecularActivities to do roughly the same thing) then it doesn't preclude also making direct gene/chemical edges in a graph if those are useful.
how does one relate a gene/protein to the activity?
The catalyst gene/protein/ncRNA/complex would also be a reaction to participant association, enabled by
(of course, proteins can also participate as substrates too)
@karthiksoman
I would treat EC IDs as the same as MetaCyc and KEGG reactions, Rhea reactions, etc. This is how we are doing it in GO
You can make a linguistic argument that enzymes are proteins hence EC IDs denote biolink:Proteins, but I think this causes lots of problems. I'm interested in the decisions that led you to treat ECs differently.
Perhaps we should discuss this next call?
@baranzini-lab good points
I think we can support all kinds of graphs with formal translations between each
This is how we are doing it in GO
@cmungall I would be interested in discussing further your thoughts on how metabolic reactions ARE similar and how they are NOT similar to GO.
In our case, we decided to introduce the concept of "enzymatic activity" (represented by the E.C. number) as a way to integrate several enzymes that can catalyze the same reaction from different species.
@cmungall
Currently, in SPOKE we have EC and reaction node types as separate entities for the reason that Sergio mentioned here. In that situation, if we map these two node types to the same biolink entity (biolink:MolecularActivity), wouldn't that introduce a many to one mapping. For example, if a TRAPI query has a node "biolink:MolecularActivity", to which SPOKE node type will we map this to, Reaction or EC?
Sure, I would be happy to discuss this during the modelling call.
Create a new association class
reaction to participant association
The fields would be:
TBD: