C-CoMP-STC / GEM-mit1002

Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Add CheBI Keys for all metabolites #46

Closed hgscott closed 4 months ago

hgscott commented 8 months ago

To go from Phytoplankton exometabolites to ModelSEED IDs, need to the CheBI Keys. My model currently has no CheBI Annotations.

Carlos and Mica may need to do the same in their models.

hgscott commented 8 months ago

My model has good coverage of MetaNetX IDs for my metabolites, so I want to use that to translate things. Image

hgscott commented 8 months ago

There is a python package for processing information from MetaNetX: https://github.com/Midnighter/metanetx-sdk

hgscott commented 7 months ago

I could not figure out how to use the python package.

hgscott commented 7 months ago

MetaNetX has big tables available for download (https://www.metanetx.org/mnxdoc/mnxref.html) but none of them include the CheBI ID.

The ModelSEED compounds database doesn't include it either.

hgscott commented 7 months ago

The MetaNetX ID Mapper does exactly what I want, but all through a web interface: Image Gives:

{"cpd00015":{"reference":"chebi:57692","InChIkey":"IMGVNJNCCGXBHD-UYBVJOGSSA-K","InChI":"InChI=1S/C27H33N9O15P2/c1-10-3-12-13(4-11(10)2)35(24-18(32-12)25(42)34-27(43)33-24)5-14(37)19(39)15(38)6-48-52(44,45)51-53(46,47)49-7-16-20(40)21(41)26(50-16)36-9-31-17-22(28)29-8-30-23(17)36/h3-4,8-9,14-16,19-21,26,37-41H,5-7H2,1-2H3,(H5,28,29,30,34,42,43,44,45,46,47)/p-3/t14-,15+,16+,19-,20+,21+,26+/m0/s1","mnx_id":"MNXM1105937","SMILES":"Cc1cc2nc3c(=O)[n-]c(=O)nc-3n(C[C@H](O)[C@H](O)[C@H](O)COP(=O)([O-])OP(=O)([O-])OC[C@H]3O[C@@H](n4cnc5c(N)ncnc54)[C@H](O)[C@@H]3O)c2cc1C","xrefs":["CHEBI:57692","chebi:57692","deprecated:MNXM1103905","deprecated:MNXM1103906","deprecated:MNXM1103907","metacyc.compound:FAD","metacyc.compound:Ox-FAD-Flavoproteins","metacycM:FAD","metacycM:Ox-FAD-Flavoproteins","seed.compound:cpd00015","seedM:M_cpd00015","seedM:cpd00015"],"name":"FAD"}}
hgscott commented 7 months ago

To use this I want to:

hgscott commented 7 months ago

This code can generate a text file with all the metabolite IDs in the format that MetaNetX wants.

hgscott commented 7 months ago

I can only query 100 things at a time Image

hgscott commented 7 months ago

I was able to split up the metabolites into chunks of 100 when making the files here.

hgscott commented 7 months ago

I converted all of the lists into JSONs, and downloaded them. Ideally, I would merge them into a single json.

hgscott commented 7 months ago

Some of the metabolites seem to be missing from the results dict: Image

hgscott commented 7 months ago

Extracting the ChEBI Keys will not be a clear one-to-one for all of the metabolites. Some of the metabolites (762 out of 1104) have a single ChEBI ID as the "reference", but other have a MetaNetX ID or something else as the reference. And in the annotations there are almost always multiple ChEBI IDs. Image

hgscott commented 7 months ago

This could be enough to search if a ChEBI ID is present in the model at all.

hgscott commented 7 months ago

I downloaded the metanetx.ttl.gz file (currently on my mac's desktop) which may let us do this all locally.

hgscott commented 4 months ago

I sent this big JSON to Jayde, and she was able to use that to make see if the phytoplankton exometabolites were in my model. I will continue work using MetaNetX in #56.