ModelSEED / ModelSEEDDatabase

This repository contains the definitive copy of the biochemistry and metadata used to construct models using the ModelSEED/ProbAnno approach
Other
53 stars 38 forks source link

Glutathione tautomerization #33

Closed JamesJeffryes closed 5 years ago

JamesJeffryes commented 7 years ago

A collaborator of mine noticed that the glutathione(cpd00042) is represented with enols, rather than as a standard keto peptide bond. I could change update the TSV files submit a pull request but I'm realizing that this might cause changes to calculated properties ect. What's the best way to go about updating these compounds?

samseaver commented 7 years ago

I'm a bit busy this week to focus on this, but, I can tell you that most problems in the database likely arise from my attempts to merge databases. What would help in turn is for you to find these originating databases in the Aliases folder, and figure out if the problem came from KEGG or MetaCyc. It then follows that, if either of these databases had the correct structure, I likely have it to hand, and can use that to derive/update the correct compound properties.

I'd prefer, wherever possible, that we associate an original mol file with compound updates, its the equivalence of a literature reference.

JamesJeffryes commented 7 years ago

There's no acute time pressure on this from me, I just wanted to figure out what the best way for me to contribute would be. Should I commit a corrections to the faulty compounds in the Structures folder if I see them? Alternatively, I can just give you an itemized list of any strange tautomers I find w/ their sources

JamesJeffryes commented 7 years ago

So I took a look at this and it seems pretty widespread (CoA, AcetylCoA, UDP-N-acetylglucosamine ect.). Maybe there is a standardization rule that's converting all the keto-amines to imines?

samseaver commented 7 years ago

So I just thought about this properly, you're talking about tautomers.

There's several things going on here, and I'll have to follow up by looking at the process and files

1) The original mol files that we downloaded from KEGG or MetaCyc may depict a different tautomer from what is expected

2) The mol files are "charged" using MarvinBeans, and pH can have an effect on the equilibrium of tautomers, right? The output is a single representation, so if it slightly favors one tautomer over the other, it assumes that the concentration is 100% that tautomer.

3) InChI strings are structured so they add all possible hydrogens in the formula itself, and then a mobile hydrogen element, which dictates if there's extra/less hydrogens (this is what's changed by MarvinBeans). But in the case of tautomers, this doesn't just mean changing the number of hydrogens, it means changing the connectivity element, and I'm not sure what the InChI standard determines here.

JamesJeffryes commented 7 years ago

Yeah, the structures on Metacyc and KEGG are in the keto form and the InChI standard does not change keto-enol forms (you can write valid InChI for both) so my money is on MarvinBeans being the culprit.

On Oct 3, 2016, at 7:55 AM, samseaver notifications@github.com wrote:

So I just thought about this properly, you're talking about tautomers.

There's several things going on here, and I'll have to follow up by looking at the process and files

1) The original mol files that we downloaded from KEGG or MetaCyc may depict a different tautomer from what is expected

2) The mol files are "charged" using MarvinBeans, and pH can have an effect on the equilibrium of tautomers, right? The output is a single representation, so if it slightly favors one tautomer over the other, it assumes that the concentration is 100% that tautomer.

3) InChI strings are structured so they add all possible hydrogens in the formula itself, and then a mobile hydrogen element, which dictates if there's extra/less hydrogens (this is what's changed by MarvinBeans). But in the case of tautomers, this doesn't just mean changing the number of hydrogens, it means changing the connectivity element, and I'm not sure what the InChI standard determines here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ModelSEED/ModelSEEDDatabase/issues/33#issuecomment-251128733, or mute the thread https://github.com/notifications/unsubscribe-auth/AIGfOYp4RPyT__WA5Bf9mozxkE-IzUy2ks5qwRdPgaJpZM4J6JYn.

samseaver commented 7 years ago

So, as part of this issue, I dug through my old code and found the MarvinBeans script I wrote. Originally, we were using the CLI (https://www.chemaxon.com/marvin-archive/5_1_01/marvin/help/applications/calc.html) but the problem was that a few of the mol files had multiple fragments, and calc only charged the first one. So I wrote a script that uses the java library to charge each fragment, and fuse them, as you'd see (in Structures/Scripts).

The chemaxon Molecular object allows quite a few functions for standardizing the output, and I never looked into them closely: https://docs.chemaxon.com/display/docs/Standardizer+Actions Obviously, the tautomerize function would've been useful here.

Finally, the ChemAxon license has long expired, so, though I'm capable of editing the script and standardizing the output more, I can only run it on a 4 year old version of MarvinBeans.

Have a look through that list, and let me know what I should be using.

JamesJeffryes commented 7 years ago

I don’t think the package hasn’t changed that much so the 4 year old version should be fine. The tautomerization will fix the structures but will be slower than a purpose built “transform” rule changing the substructure. For a set of this size, I’d say just use the tautomerization rule.

On Oct 3, 2016, at 9:46 PM, samseaver notifications@github.com wrote:

So, as part of this issue, I dug through my old code and found the MarvinBeans script I wrote. Originally, we were using the CLI (https://www.chemaxon.com/marvin-archive/5_1_01/marvin/help/applications/calc.html https://www.chemaxon.com/marvin-archive/5_1_01/marvin/help/applications/calc.html) but the problem was that a few of the mol files had multiple fragments, and calc only charged the first one. So I wrote a script that uses the java library to charge each fragment, and fuse them, as you'd see (in Structures/Scripts).

The chemaxon Molecular object allows quite a few functions for standardizing the output, and I never looked into them closely: https://docs.chemaxon.com/display/docs/Standardizer+Actions https://docs.chemaxon.com/display/docs/Standardizer+Actions Obviously, the tautomerize function would've been useful here.

Finally, the ChemAxon license has long expired, so, though I'm capable of editing the script and standardizing the output more, I can only run it on a 4 year old version of MarvinBeans.

Have a look through that list, and let me know what I should be using.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ModelSEED/ModelSEEDDatabase/issues/33#issuecomment-251294798, or mute the thread https://github.com/notifications/unsubscribe-auth/AIGfOY0hID8J1bIL1y1sQA-1oUuYfC2Eks5qwdopgaJpZM4J6JYn.

samseaver commented 7 years ago

@JamesJeffryes We can close this right?

JamesJeffryes commented 7 years ago

While this is fixed in theory (it seemed to change when you reran standardization), the form in the compounds file is still wrong so the issue should stay open

On Thu, Aug 3, 2017 at 3:24 PM, samseaver notifications@github.com wrote:

@JamesJeffryes https://github.com/jamesjeffryes We can close this right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ModelSEED/ModelSEEDDatabase/issues/33#issuecomment-320079837, or mute the thread https://github.com/notifications/unsubscribe-auth/AIGfORvm0RdbGvKoqXfDOJrBW2bn6DCuks5sUixigaJpZM4J6JYn .