Add CheBI Keys for all metabolites - Githubissues

C-CoMP-STC / GEM-mit1002

Creative Commons Attribution 4.0 International

0 stars 0 forks source link

Add CheBI Keys for all metabolites #46

Closed hgscott closed 8 months ago

hgscott commented 1 year ago

To go from Phytoplankton exometabolites to ModelSEED IDs, need to the CheBI Keys. My model currently has no CheBI Annotations.

Carlos and Mica may need to do the same in their models.

hgscott commented 1 year ago

My model has good coverage of MetaNetX IDs for my metabolites, so I want to use that to translate things.

hgscott commented 1 year ago

There is a python package for processing information from MetaNetX: https://github.com/Midnighter/metanetx-sdk

hgscott commented 12 months ago

I could not figure out how to use the python package.

hgscott commented 12 months ago

MetaNetX has big tables available for download (https://www.metanetx.org/mnxdoc/mnxref.html) but none of them include the CheBI ID.

The ModelSEED compounds database doesn't include it either.

hgscott commented 12 months ago

The MetaNetX ID Mapper does exactly what I want, but all through a web interface: Gives:

{"cpd00015":{"reference":"chebi:57692","InChIkey":"IMGVNJNCCGXBHD-UYBVJOGSSA-K","InChI":"InChI=1S/C27H33N9O15P2/c1-10-3-12-13(4-11(10)2)35(24-18(32-12)25(42)34-27(43)33-24)5-14(37)19(39)15(38)6-48-52(44,45)51-53(46,47)49-7-16-20(40)21(41)26(50-16)36-9-31-17-22(28)29-8-30-23(17)36/h3-4,8-9,14-16,19-21,26,37-41H,5-7H2,1-2H3,(H5,28,29,30,34,42,43,44,45,46,47)/p-3/t14-,15+,16+,19-,20+,21+,26+/m0/s1","mnx_id":"MNXM1105937","SMILES":"Cc1cc2nc3c(=O)[n-]c(=O)nc-3n(C[C@H](O)[C@H](O)[C@H](O)COP(=O)([O-])OP(=O)([O-])OC[C@H]3O[C@@H](n4cnc5c(N)ncnc54)[C@H](O)[C@@H]3O)c2cc1C","xrefs":["CHEBI:57692","chebi:57692","deprecated:MNXM1103905","deprecated:MNXM1103906","deprecated:MNXM1103907","metacyc.compound:FAD","metacyc.compound:Ox-FAD-Flavoproteins","metacycM:FAD","metacycM:Ox-FAD-Flavoproteins","seed.compound:cpd00015","seedM:M_cpd00015","seedM:cpd00015"],"name":"FAD"}}

hgscott commented 12 months ago

To use this I want to:

[x] Write a script that pulls all of the metabolite IDs out and make a string with one on each line
[x] Manually copy/paste that string into the mapper and download the JSON file
[ ] Write a script that pulls the CHEBI IDs from the JSON and adds them to the model file (Is this what I want to do, John didn't like that the model files I showed him had so many entries in the annotation field).

hgscott commented 12 months ago

This code can generate a text file with all the metabolite IDs in the format that MetaNetX wants.

hgscott commented 12 months ago

I can only query 100 things at a time

hgscott commented 12 months ago

I was able to split up the metabolites into chunks of 100 when making the files here.

hgscott commented 12 months ago

I converted all of the lists into JSONs, and downloaded them. Ideally, I would merge them into a single json.

hgscott commented 12 months ago

Some of the metabolites seem to be missing from the results dict:

hgscott commented 12 months ago

Extracting the ChEBI Keys will not be a clear one-to-one for all of the metabolites. Some of the metabolites (762 out of 1104) have a single ChEBI ID as the "reference", but other have a MetaNetX ID or something else as the reference. And in the annotations there are almost always multiple ChEBI IDs.

hgscott commented 12 months ago

This could be enough to search if a ChEBI ID is present in the model at all.

hgscott commented 11 months ago

I downloaded the metanetx.ttl.gz file (currently on my mac's desktop) which may let us do this all locally.

hgscott commented 8 months ago

I sent this big JSON to Jayde, and she was able to use that to make see if the phytoplankton exometabolites were in my model. I will continue work using MetaNetX in #56.