Merging reactions in reverse directions

zakandrewking commented 7 years ago

I would like to propose a change to the reaction identifiers for BiGG Models, and hopefully I can get some feedback before we make any major breaking changes.

The idea is to merge reactions that are identical except the stoichiometry direction

An example is:

aacoa_c + h_c + nadh_c ⇌ 3hbcoa_c + nad_c http://bigg.ucsd.edu/universal/reactions/HACD1

3hbcoa_c + nad_c ⇌ aacoa_c + h_c + nadh_c http://bigg.ucsd.edu/universal/reactions/HACD1i

It's pretty clear that these are the same reaction, just with different bounds and direction. If we merge them together, models with both reactions will still have the same S matrices. We will just link two reactions in the each model to the same universal reaction (HACD1). We already do this for other situations.

However, if we merge reverse reactions, then cases like SUCD1 and FRD also get merged

In iNJ661:

fadh2_c + fum_c ⇌ fad_c + succ_c http://bigg.ucsd.edu/universal/reactions/FRD

fad_c + succ_c ⇌ fadh2_c + fum_c http://bigg.ucsd.edu/universal/reactions/SUCD1

This might cause confusion if people are used to seeing them separated. We could also create a list of exceptions to this case – if so, I'd love feedback one which of these should go in that exceptions list.

For users of iAF1260, these will get merged:

aacoa_c + coa_c ⇌ 2.0 accoa_c http://bigg.ucsd.edu/universal/reactions/KAT1

2.0 accoa_c ⇌ aacoa_c + coa_c http://bigg.ucsd.edu/universal/reactions/ACACT1r

Does anyone has thoughts on this?

A full list of the proposed changes is here: https://pastebin.com/A2jEnM4M

@nel3 @jlerman44 @coltonlloyd @jonm4024 @hhefzi @draeger @phantomas1234 @cdanielmachado @smoretti @pstjohn

cdanielmachado commented 7 years ago

Great idea. I am all in favor of reducing redundancy in the universal reactions. But the point you raise about succinate dehydrogenase and fumarate reductase does seem quite relevant and inspires some precaution.

I guess it would make sense to remove anything that got duplicated just by different models adapting different direction conventions for reversible reactions, but preserve anything that might have biological meaning (for example, the same reaction being catalyzed by different enzymes in each direction because each has a different regulation mechanism).

The question is then how to tell these two cases apart just from the data we already have in BiGG.

One criteria could be: if two equivalent reactions came from the same model, and have different GPRs within that model, then maybe there is a biological relevance for splitting the reaction into two directions, and therefore they should be kept.

smoretti commented 7 years ago

I agree with @cdanielmachado, reducing redundancy between models in the universal reactions would be great. I would also merge SUCD1 and FRD ONLY IF different GPRs and/or fluxes can unambiguously be associated to each of them. But it certainly means creating alternative ids - in SBML for example - and that would move duplicates there.

jlerman44 commented 7 years ago

I like the merging in cases when the GPR is the same, but not when they are different.

I believe some "COPY" or duplicate reactions were solved like this in the past ?

But I do see the value in having a universal compartmentalized reaction linkage. Some warnings in the cobra toolbox would probably go a long way.

Best, Josh

On Thu, Jun 15, 2017 at 3:06 AM smoretti notifications@github.com wrote:

I agree with @cdanielmachado https://github.com/cdanielmachado, reducing redundancy between models in the universal reactions would be great. I would also merge SUCD1 and FRD ONLY IF different GPRs and/or fluxes can unambiguously be associated to each of them. But it certainly means creating alternative ids - in SBML for example - and that would move duplicates there.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SBRG/bigg_models/issues/261#issuecomment-308688174, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQB2t9m5J8HBz8qR-KYgn3W6XS8Vaosks5sEQImgaJpZM4N6Kk1 .

zakandrewking commented 7 years ago

Thanks everybody for the feedback. I'm going to try out @cdanielmachado's suggestion and see how the database looks.

draeger commented 7 years ago

If it is still relevant, my comment would be that if there are different catalysts involved (e.g., different E.C. number), you should not merge. If they are redundant, i.e., everything identical (enzyme, GPRs, participating compounds) only left and right hand side is flipped, please go ahead and merge.

Actually, my understanding is that for reversible reactions you can in principle flip left and right hand sides. So even the cases with different enzymes could at least be aligned in the sense that they have the same reactants and products, even if we continue keeping them separate. A difference only comes up if reactions are irreversibly proceeding in either direction.

matthiaskoenig commented 7 years ago

Hi all, I don't understand the merging based on identical GPR? Could somebody explain this. Every organisms has different proteins and genes, so how could a reaction have the same GPR for all the organisms? Also this would limit things to the organisms in BiGG, but probably someone wants to use the BiGG reactions for different organisms where different GPRs exist?

I assume merging based on E.C. number (main enzyme) could work. Like @smoretti said

I would merge SUCD1 and FRD ONLY IF different GPRs and/or fluxes can unambiguously be associated to each of them.

But this would mean to have something like an AbstractReaction without any E.C. or enzyme (similar to RHEA) to which different enzymes can be associated with preferred reaction and reversibility of the AbstractReaction. M

cdanielmachado commented 7 years ago

They would of course always have different GPRs in different organisms. That's why my suggestion is defining a list of criteria not to merge reactions. One would be having different GPRs within the same organism, which clearly indicates that different enzymes (and possibly different regulation mechanisms) are involved.

One problem of using EC numbers, is that (unlike what the name indicates) they classify reactions, not enzymes. For instance, SUCD1 and FRD have the same EC number (1.3.5.1), and (at least in ecoli) they are not catalyzed by the same enzymes.

matthiaskoenig commented 7 years ago

Thanks for the clarification. So it is mainly about merging reactions within one model. Than identical GPR as criterion for merging makes sense. M

draeger commented 7 years ago

Universal reactions in BiGG can be across individual models.

cdanielmachado commented 7 years ago

I think @zakandrewking's idea is to do this for the universal reactions. Getting additional data from the original models as a way to decide on how to merge them is a possible approach to do it.

draeger commented 7 years ago

Thanks @cdanielmachado for clarifying the meaning of E.C. numbers. So the main opinion seems to be "merge reactions as long as there is no difference in the catalyst" but it is not entirely clear what makes that difference. GPRs can be different across organisms, E.C. numbers can be identical for different physical enzymes.

I also add: align bidirectional reactions to have identical reactants and products even if we don't merge them.

zakandrewking commented 7 years ago

I will try to clarify some of the terminology here. We never fully merge reactions. We always keep the original model-specific reactions, but we group them within a single universal reaction. That's what I meant by "merge" above.

In general, we already group reactions with the same stoichiometry, even if they have different GPRs. For instance GLCtex and GLCtexi from iJO1366 are now both called GLCtex in BiGG:

http://bigg.ucsd.edu/models/iJO1366/reactions/GLCtex

When you download the model, these are called GLCtex_copy1 and GLCtex_copy2 in SBML. That's not ideal, but in a new reconstruction these could easily be merged into a single reaction with a larger gene reaction rule (b0241 or b0929 or b1377 or b2215 or b4036), and BiGG tries to push people in that direction.

BiGG defines reactions much like EC numbers – they are just a combination of metabolites with stoichiometric coefficients. However, we make some exceptions:

Pseudoreactions (exchanges, demands, etc) are not merged with ordinary reactions. Thus, ATPM and NTP1 are kept separate even though they have the same stoichiometry.
Until this proposal, reverse reactions were separated, because of the challenges with reaction pairs like FRD/SUCD1.

We could just combine FRD/SUCD1 into one universal reaction, but for historical reasons, people will be confused by this change – even though it is technically the correct approach in the BiGG Models schema.

zakandrewking commented 7 years ago

Here's an updated list of potential changes when reactions in the same model are not grouped. I.e. FRD/SUCD1 stay separate.

https://pastebin.com/DGvUnM8X

cdanielmachado commented 7 years ago

I picked a few examples at random and everything seems to make sense.

By the way, I don't know how you choose which identifier becomes the universal one, but in cases like this:

"Matched r0021 to GDR based on reverse hash"

please make sure the human-readable ID gets selected 😃

zakandrewking commented 7 years ago

Oh yeah, in all the listed cases, the ID on the left is replaced by the one on the right.

We can also customize any of these cases if people have strong preferences for a particular ID.

SBRG / bigg_models