AuReMe / metage2metabo

From annotated genomes to metabolic screening in large scale microbiotas
https://metage2metabo.readthedocs.io
GNU Lesser General Public License v3.0
50 stars 7 forks source link

Help with creating seed file #18

Closed choon-sim closed 2 years ago

choon-sim commented 2 years ago

Hi, I am importing my GSM models that were made using gapseq into m2m. I am confused by how to create the seed file. In your tutorial, it is said:

Once a list of metabolites has been designed, these metabolites must be converted to a list of IDs of the metabolic database corresponding to your metabolic networks. For example, in the VMH seeds ethanol is etoh. In the MetaCyc database, the ID of ethanol is ETOH. Then the ID must be checked with the ID in the SBML files of the metabolic networks. In this example ETOH is associated to M_ETOH_c in the SBML file in the species field (<species id="M_ETOH_c" name="ETOH" compartment="c"/>). The M_ corresponds to metabolite (another possibility for this prefix is the R_ for reaction). And _c corresponds to the cytosol compartment.

I basically want the following metabolites in the seed file. These metabolites were used to fill gaps in the models in gapseq, so it's good to keep things consistent.

compounds,name,maxFlux
cpd00363,Ethanol,0.1
cpd00001,H2O,100
cpd01420,beta-Carotene,0.1
cpd00365,Retinol,0.1
cpd00305,Thiamin,0.1
cpd03424,Vitamin B12,0.1
cpd00220,Riboflavin,0.1
cpd00218,Niacin,0.1
cpd00133,Nicotinamide,0.1
cpd00644,PAN,0.1
cpd00419,PM,0.1
cpd00263,Pyridoxol,0.1
cpd00215,Pyridoxal,0.1
cpd00104,BIOT,0.1
cpd00201,10-Formyltetrahydrofolate,0.1
cpd00345,5-Methyltetrahydrofolate,0.1
cpd00087,Tetrahydrofolate,0.1
cpd00059,L-Ascorbate,0.1
cpd00857,Provitamin D3,0.1
cpd01628,Vitamin E,0.1
cpd01401,Vitamin K1,0.1
cpd00063,Ca2+,100
cpd00099,Cl-,5
cpd00205,K+,100
cpd00254,Mg,7
cpd00971,Na+,4
cpd00009,Phosphate,6
cpd00058,Cu2+,0.1
cpd10515,Fe2+,0.1
cpd10516,fe3,0.1
cpd00030,Mn2+,0.1
cpd00034,Zn2+,0.1
cpd00314,D-Mannitol,0.1
cpd00588,Sorbitol,0.1
cpd00306,Xylitol,0.1
cpd00208,LACT,0.1
cpd00179,Maltose,0.1
cpd00076,Sucrose,3
cpd00082,D-Fructose,1
cpd00108,Galactose,0.1
cpd00027,D-Glucose,2
cpd00035,L-Alanine,3
cpd00051,L-Arginine,2
cpd00041,L-Aspartate,4
cpd00084,L-Cysteine,0.1
cpd00023,L-Glutamate,9
cpd00033,Glycine,3
cpd00300,Urate,4
cpd00119,L-Histidine,1
cpd00322,L-Isoleucine,2
cpd00107,L-Leucine,4
cpd00039,L-Lysine,3
cpd00060,L-Methionine,1
cpd00066,L-Phenylalanine,2
cpd00129,L-Proline,5
cpd00054,L-Serine,3
cpd00161,L-Threonine,2
cpd00065,L-Tryptophan,0.1
cpd00069,L-Tyrosine,1
cpd00156,L-Valine,3
cpd01107,Decanoate,0.1
cpd03847,Myristic acid,1
cpd15622,pentadecanoate (C15:0),0.1
cpd00214,Palmitate,4
cpd15609,heptadecanoate (C17:0),0.1
cpd01080,ocdca,1
cpd15269,octadecenoate,4
cpd01122,Linoleate,3
cpd03850,Linolenate,0.1
cpd15016,Stearidonic acid,0.1
cpd03848,Arachidic acid,0.1
cpd00188,Arachidonate,0.1
cpd16342,Adrenic acid,0.1
cpd16301,Docosapentaenoic acid,0.1
cpd03852,Docosahexaenoic acid,0.1
cpd00211,Butyrate,1
cpd03846,octanoate,0.1
cpd00160,Cholesterol,1
cpd27519,Pectin,10
cpd00158,CELB,100
cpd11732,Xylan,0.1
cpd11970,Arabinoxylan,0.1
cpd11955,Glucomannan,0.1
cpd00656,Galactomannan,0.1
cpd11696,beta-Glucan,1
cpd00067,H+,100
cpd00149,Co,0.1
cpd00011,CO2,100
cpd11640,H2,100
cpd00048,SO4,0.1
cpd00029,Acetate,1
cpd00141,Propionate,1
cpd00256,Cholate,1
cpd01663,Chemodeoxycholate,1
cpd02733,Deoxycholate,1
cpd02475,Lithocholate,1
cpd03246,Taurochenodeoxycholate,1
cpd03247,Glycochenodeoxycholate,1
cpd03047,Taurocholate,1
cpd01318,Glycocholate,1

The gapseq website (https://gapseq.readthedocs.io/en/latest/database/biochemistry.html#) says: The gapseq database for chemical compounds and reactions originated from the SEED database.

Any advice will be greatly appreciated.

cfrioux commented 2 years ago

Hi @choon-sim ,

Our explanation is mostly a warning so that users do not end up with seeds identifiers that do not match metabolite identifiers in their metabolic networks.

Let's assume the metabolites in your metabolic networks have identifiers such as "cpd00029", you can create the seed file with the command:

m2m seeds -o output --metabolites metabolites.txt

with the content of metabolites.txt looking like this:

cpd00048
cpd00029
cpd00141
cpd00256
cpd01663
cpd02733
cpd02475
cpd03246
cpd03247
cpd03047
cpd01318
...

However, standard SBML files of metabolic networks usually add a prefix ("M_") and a suffix (the compartment) to identifiers of metabolites. You can easily check that by looking at the contents of the SBML files. You might see <species id="M_cpd00256_c" name="cpd00256" compartment="c"/> for instance. In that case, it is important that your seed identifiers follow the same patterns, and metabolites.txt would look like this:

M_cpd02475_c
M_cpd03246_c
M_cpd03247_c
M_cpd03047_c
M_cpd01318_c
...

We advise using the cytosolic compartment if transport reactions for seeds from the extracellular space are missing. Let me know if something remains unclear.

choon-sim commented 2 years ago

Hi @cfrioux, Thanks for the explanation. I opened the sbml file of the metabolic network. Water for example is cpd00001 and I saw 3 different IDs.

     <species metaid="M_cpd00001_c0" id="M_cpd00001_c0" name="H2O-c0" compartment="c0" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false" fbc:charge="0" fbc:chemicalFormula="H2O">

      <species metaid="M_cpd00001_e0" id="M_cpd00001_e0" name="H2O-e0" compartment="e0" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false" fbc:charge="0" fbc:chemicalFormula="H2O">

      <reaction metaid="R_EX_cpd00001_e0" id="R_EX_cpd00001_e0" name="H2O-e0 Exchange" reversible="true" fast="false" fbc:lowerFluxBound="R_EX_cpd00001_e0_lower_bound" fbc:upperFluxBound="default_ub">

Do I add these to metabolites.txt?

M_cpd00001_c0
M_cpd00001_e0
R_EX_cpd00001_e0
ArnaudBelcour commented 2 years ago

Hi @choon-sim,

For the metabolites.txt file, in this example you only have to add:

M_cpd00001_c0
M_cpd00001_e0

Because they are metabolites (marked by the species tag in your SBML file and also by the prefix M_ in the M_cpd00001_c0 ID) whereas R_EX_cpd00001_e0 is a biochemical reaction (marked by the tag reaction and also by the prefix R_ in the R_EX_cpd00001_e0 ID).

Note that it is very likely that identifiers with c0 as a suffix (such as M_cpd00001_c0) relate to metabolites located in the cytosol and the ones with e0 (such as M_cpd00001_e0) in the extracellular space. Therefore you can add only the e0 compounds if you are confident with the presence of adequate transport reactions (because in this way the extracellular metabolites will be transported to the cytosol compartment by these transport reactions), or only the c0 compounds otherwise.

choon-sim commented 2 years ago

Hi @cfrioux,

It makes sense. I would add just the c0 compounds then. Thanks!