Closed hgscott closed 8 months ago
My model has good coverage of MetaNetX IDs for my metabolites, so I want to use that to translate things.
There is a python package for processing information from MetaNetX: https://github.com/Midnighter/metanetx-sdk
I could not figure out how to use the python package.
MetaNetX has big tables available for download (https://www.metanetx.org/mnxdoc/mnxref.html) but none of them include the CheBI ID.
The ModelSEED compounds database doesn't include it either.
The MetaNetX ID Mapper does exactly what I want, but all through a web interface: Gives:
{"cpd00015":{"reference":"chebi:57692","InChIkey":"IMGVNJNCCGXBHD-UYBVJOGSSA-K","InChI":"InChI=1S/C27H33N9O15P2/c1-10-3-12-13(4-11(10)2)35(24-18(32-12)25(42)34-27(43)33-24)5-14(37)19(39)15(38)6-48-52(44,45)51-53(46,47)49-7-16-20(40)21(41)26(50-16)36-9-31-17-22(28)29-8-30-23(17)36/h3-4,8-9,14-16,19-21,26,37-41H,5-7H2,1-2H3,(H5,28,29,30,34,42,43,44,45,46,47)/p-3/t14-,15+,16+,19-,20+,21+,26+/m0/s1","mnx_id":"MNXM1105937","SMILES":"Cc1cc2nc3c(=O)[n-]c(=O)nc-3n(C[C@H](O)[C@H](O)[C@H](O)COP(=O)([O-])OP(=O)([O-])OC[C@H]3O[C@@H](n4cnc5c(N)ncnc54)[C@H](O)[C@@H]3O)c2cc1C","xrefs":["CHEBI:57692","chebi:57692","deprecated:MNXM1103905","deprecated:MNXM1103906","deprecated:MNXM1103907","metacyc.compound:FAD","metacyc.compound:Ox-FAD-Flavoproteins","metacycM:FAD","metacycM:Ox-FAD-Flavoproteins","seed.compound:cpd00015","seedM:M_cpd00015","seedM:cpd00015"],"name":"FAD"}}
To use this I want to:
This code can generate a text file with all the metabolite IDs in the format that MetaNetX wants.
I can only query 100 things at a time
I was able to split up the metabolites into chunks of 100 when making the files here.
I converted all of the lists into JSONs, and downloaded them. Ideally, I would merge them into a single json.
Some of the metabolites seem to be missing from the results dict:
Extracting the ChEBI Keys will not be a clear one-to-one for all of the metabolites. Some of the metabolites (762 out of 1104) have a single ChEBI ID as the "reference", but other have a MetaNetX ID or something else as the reference. And in the annotations there are almost always multiple ChEBI IDs.
This could be enough to search if a ChEBI ID is present in the model at all.
I downloaded the metanetx.ttl.gz file (currently on my mac's desktop) which may let us do this all locally.
I sent this big JSON to Jayde, and she was able to use that to make see if the phytoplankton exometabolites were in my model. I will continue work using MetaNetX in #56.
To go from Phytoplankton exometabolites to ModelSEED IDs, need to the CheBI Keys. My model currently has no CheBI Annotations.
Carlos and Mica may need to do the same in their models.