Closed famosab closed 3 months ago
I added Casamino Acids to the media definition based on an article from 1971 - Amino Acids and Growth Factors in Vitamin-Free Casamino Acids.
@GwennyGit maybe you can add the gut medium as soon as you get to that. But we might rethink how we note the media definitions. At the moment it is just a big csv file which works but entering it into the database we use for sboann
might be more elegant?
The idea of using the database sounds good. However, if someone would want to use the media definitions in another program like for example the gapfill
function of CarveMe it is easier to transform the CSV file into the required format. Additionally, it might be easier to use the CSV file to add other media if the user wants to. @famosab What do you think? 🤔
I think we should move the existing media definitions into the database as well. You mentioned somewhere that access via pandas should be possible. If that works for a user that just installs refineGEMs via pip, it would be great! Maybe we can implement a function which exports the database entries to a csv medium definition. The functionality for a possible user would still be the same since they could just use a local csv as well.
Yes, I mentioned that in issue #49, and I am currently working on that task.
At the moment we have both CGXII and CGXIlab in the database. I would advise to remove CGXII and replace it by the composition of CGXlab since that is the composition which is used in for the manuscript we will publish soon and CGXII is just a file I got a while ago but is not verfiied with laboratory use. We could also remove LB and M9 without oxygen or write a small function to allow for anaerobic simulation on any of the media.
Removing CGXII and only keeping CGXlab of these two media is a good idea. However, I think it would also be good to describe all media in the documentation so that the user knows what it is, why the user could use which and so on.
Yeah, creating a small function to allow for anaerobic simulation on any media sounds like a great idea. Then this simulation part would not only be restricted to M9 and LB.
I created so far only the basic set-up for the media definition pages. Thus the pages still need to be filled with content.
Re-evaluation of SNM3 The composition of SNM3 within refineGEMs was compared to another composition used within the draeger-lab group as well as to the original wet lab composition described in 'Nutrient Limitation Governs Staphylococcus aureus Metabolism and Niche Adaptation in the Human Nose' [1]. Additionally, all compound assignments to the BiGG database were checked and the names added with the following pattern: BiGG ID [Name in wet lab description].
From the comparison the following differences were found:
adocbl
which is the BiGG ID for Adenosylcobalamine. As the chemical formula is no good match for Cyanocobalamine this compound was removed from the definition.
-> Replaced adocbl
with cbl1
(Cob(I)alamine), cbl2
(Cob(II)alamine and b12
(Vitamin B12). The first two BiGG IDs were chosen due to having a high similarity to the chemical formula of Cyanocobalamine and being already included in the SNM3 definition from the draeger-lab group. b12
was added as Cyanocobalamine is Vitamin B12.fe2
and fe3
are contained in the SNM3 definition. However, the refineGEMs definition only contained fe2
. Thus, fe3
was added.In conclusion after discussing with @famosab we decided to add all analoga for each compound for all media. Hence, the addition of all possible similar compounds to Cyanocobalamine and Iron (Fe).
[1] Krismer, Bernhard; Liebeke, Manuel; Janek, Daniela; Nega, Mulugeta; Rautenberg, Maren; Hornig, Gabriele et al. (2014): Nutrient Limitation Governs Staphylococcus aureus Metabolism and Niche Adaptation in the Human Nose. In: PLOS Pathogens 10 (1), e1003862. DOI: 10.1371/journal.ppat.1003862.
Re-evaluation of RPMI The composition of the in silico RPMI medium was compared to the provider reference. This comparison yielded the following points:
tyr__L
is chosen for L-Tyrosine disodium salt dihydrate inost
is Myo-Inositol but RPMI contains I-Inositol, these two are different but since there is no BiGG Id for I-Inositol yet it was left insidehco3
was added since Sodium Bicarbonate is contained4hpro_LT
was added since L-Hydroxyproline is containedb12
was added for Vitamin B12, cbl1
was kept and clb1
was removed since it is not a correct BiGG Idnac
was removed since the medium does only contain Niacinamide which is covered by ncam
h
was added since L-Cysteine HCl is containedRe-evaluation of M9
The M9 composition is based on the provider reference for the minimal salts:
k
, h
, pi
na1
, cl
na1
, h
, pi
nh4
, cl
Plus necessary additives as described here and here:
mg2
, so4
ca2
, cl
glc__D
And o2
and h2o
are present per standard.
Addition of the defined Gut Microbiota Medium (dGMM) to the database To get all relevant BiGG IDs for the salts the following table was used: | Ion | Abundance | BiGG ID | BiGG name |
---|---|---|---|---|
Fe(II) | 1 | fe2 | Fe2+ | |
SO4 | 6 | so4 | Sulfate | |
Zn | 1 | zn2 | Zinc | |
Co | 1 | cobalt2 | Co2+ | |
NO3 | 1 | no3 | Nitrate | |
Al | 1 | - | - | |
K | 2 | k | Potassium | |
Na | 5 | na1 | Sodium | |
SeO3 | 1 | slnt | Selenite | |
WO4 | 1 | tungs | Tungstate | |
Ni | 1 | ni2 | Nickel | |
Cl | 3 | cl | Chloride | |
Ca | 1 | ca2 | Calcium | |
Cu | 1 | cu2 | Copper | |
Mn | 1 | mn2 | Manganese | |
Mg | 1 | mg2 | Magnesium | |
HCO3 | 1 | hco3 | Bicarbonate | |
MoO4 | 1 | mobd | Molybdate | |
H | 1 | h | Hydrogen | |
PO4 | 1 | pi | Phosphate |
Additionally, water was added to the definition as most salts were added with water in the laboratory version of GMM. For Resazurin, boric acid (H3BO3), Aluminium (Al), dihydrogen phosphate (H2PO4) and EDTA no BiGG IDs were found. However, dihydrogen phosphate could be separated into hydrogen (H) and phosphate (PO4) for which BiGG IDs exist. This was changed in commit (Needs to be committed❗).
I added a page to the documentation to describe how one can get from a laboratory medium to an in silico one. @famosab Feel free to add adjustments.
Re-evaluation of Blood
The definition for Blood is missing relevant components like water or irons. From @NantiaL I got another medium definition which I compared to our current one and was also used in the paper 'New workflow predicts drug targets against SARS-CoV-2 via metabolic changes in infected cells'[2]. The definition from Nantia was obtained in collaboration with the authors of the paper 'Longitudinal Multi-omics Analyses Identify Responses of Megakaryocytes, Erythroid Cells, and Plasmablasts as Hallmarks of Severe COVID-19' [1]. Comparing the Blood medium version currently in the database to Nantia's definition revealed that from the current version Allantoin, Glucose, D-Malate and the BiGG ID nicnt
for Nicotinate were absent in Nantia's medium definition. As the rest overlapped and Nantia's version contains many more components the current version of the Blood medium will be extended with Nantia's version and Allantoin as well as Glucose will be removed.
The following three BiGG IDs deemed to be not valid:
gbside_hs
: This is the identifier for Globoside but in the definition assigned to Tetrahexosylceramide. According to PubChem Globoside is not a synonym for this compound. However, talking with Nantia revealed that in literature both substances are equal. (Here are some examples: [3], [4], [5])phyQ
: This identifier was not findable in the BiGG database. However, searching for Phylloquinone which is a synonym for Vitamin K1 revealed that phyQ
is an old identifier for this compound. Thus, the new BiGG ID ´phllqne´ was added additionally.q10
: This identifier was mapped to the compound Ubiquinone-1. However, after talking to Nantia Ubiquinone-10 is correct and not Ubiquinone-1. -> The problem with the identifier phyQ
was already fixed in commit https://github.com/draeger-lab/refinegems/commit/a49a96910e59b25063bed79b928f36bb4b83f70a. With commit https://github.com/draeger-lab/refinegems/commit/583e46c1186d4afb8caf918e7d40bf1cfd8d902e the entries for the other two identifiers gbside_hs
and q10
were adjusted.
[1] Joana P. Bernardes, Neha Mishra, Florian Tran, Thomas Bahmer, Lena Best, Johanna I. Blase, Dora Bordoni, Jeanette Franzenburg, Ulf Geisen, Jonathan Josephs-Spaulding, Philipp Köhler, and Axel Künstner. Longitudinal multi-omics analyses identify responses of megakaryocytes, erythroid cells, and plasmablasts as hallmarks of severe covid-19. Immunity, 53(6):1296–1314.e9, 2020. URL: https://www.sciencedirect.com/science/article/pii/S1074761320305045, doi:https://doi.org/10.1016/j.immuni.2020.11.017. [2] Nantia Leonidou, Alina Renz, Reihaneh Mostolizadeh, and Andreas Dräger. New workflow predicts drug targets against sars-cov-2 via metabolic changes in infected cells. PLOS Computational Biology, 19(3):1–32, 03 2023. URL: https://doi.org/10.1371/journal.pcbi.1010903, doi:10.1371/journal.pcbi.1010903. [3] Mack, Stephen R.; Szuchet, Sara (1981): Synthesis of myelin glycosphingolipids by isolated oligodendrocytes in tissue culture. In: Brain Research 214 (1), S. 180–185. DOI: 10.1016/0006-8993(81)90451-0. [4] Müsken, Anne; Souady, Jamal; Dreisewerd, Klaus; Zhang, Wenlan; Distler, Ute; Peter-Katalinić, Jasna et al. (2010): Application of thin-layer chromatography/infrared matrix-assisted laser desorption/ionization orthogonal time-of-flight mass spectrometry to structural analysis of bacteria-binding glycosphingolipids selected by affinity detection. In: Rapid communications in mass spectrometry : RCM 24 (7), S. 1032–1038. DOI: 10.1002/rcm.4480. [5] Detzner, Johanna; Pohlentz, Gottfried; Müthing, Johannes (2020): Valid Presumption of Shiga Toxin-Mediated Damage of Developing Erythrocytes in EHEC-Associated Hemolytic Uremic Syndrome. In: Toxins 12 (6), S. 373. DOI: 10.3390/toxins12060373.
Addition of anaerobic growth simulation
To add anaerobe growth simulation the uptake rate for the exchange reaction of oxygen has to be set to 0.0
. I think we can easily add the bool parameter ´anaerobic´ to the function simulate_minimum_essential
of the growth
module. If this parameter is set to ´True´ the EX_o2_e
reaction will then be set to 0.0
within this function otherwise the current definition is used. Additionally, this parameter needs to be included in the ´config.yaml´ file and the according ´io´ function, respectively.
Update on the Urine Medium The current composition only contains metabolites. Thus, the bacterial models are not growing on it. In the search for an artificial urine composition several papers with different compositions were found. However, most of the definitions exclude amino acids which were found in urine and are necessary for bacterial growth. I found one paper with tables about the composition of urine compounds detected with NMR. However, no medium definition is provided (The Human Urine Metabolome). Most papers overlap in the described urine composition (A New Artificial Urine Protocol to Better Imitate Human Urine, A simple artificial urine for the growth of urinary pathogens). The paper by T. Brooks and C. W. Keevil seems to have a good definition of a urine medium for bacterial growth testing. However, the definition includes Yeast Extract[^1] and Peptone L37 which are difficult to transfer into in siico medium definitions. For now, I will replace the Urine Medium definition with the MP-AU definition provided by Neslihan Sarigul, Filiz Korkmaz and İlhan Kurultak as it largely overlaps with the definition by T. Brooks and C. W. Keevil. The Medium name will also be changed to MP-AU.
-> In conclusion, the new definition might still need to be revised to let bacterial models grow.
[^1]: The definition of yeast extract for in silico use can be obtained from the paper by Oh, You-Kwan et al. where yeast extract needed to be defined for the LB medium. (See the Supplemental WORD document 'Complex medium composition' of the paper.)
I realised that the dGMM contains no oxygen in its definition. So I looked again at the paper and found out that this medium was used under anaerobic conditions. However, in the paper, a gas mix containing carbon dioxide, nitrogen and hydrogen was added to the anaerobic chamber. As hydrogen is already part of the in silico definition in the database only carbon dioxide and nitrogen are added with the next commit. Additionally, the medium is renamed to dGMM as it is actually the defined version of GMM and not the GMM.
Update on MP-AU As I realised that MP-AU also contains no oxygen I searched for a paper where this medium is used for bacteria. I found a paper by Pan, Altenried and Scheibler et al.[1] in which the MP-AU medium was used for Pseudomonas aeruginosa. In this paper, it is not mentioned that MP-AU was used under anaerobic conditions. Thus, oxygen is added to the in silico MP-AU definition with the next commit.
[1] Fei Pan, Stefanie Altenried, Subas Scheibler, Alexandre H.C. Anthis, Qun Ren, Specific capture of Pseudomonas aeruginosa for rapid detection of antimicrobial resistance in urinary tract infections, Biosensors and Bioelectronics, Volume 222, 2023, 114962, ISSN 0956-5663, https://doi.org/10.1016/j.bios.2022.114962. (https://www.sciencedirect.com/science/article/pii/S0956566322010028)
The Synthetic Minimal Medium (SMM) is removed from the database.
While changing the database set-up for the media tables part, I noticed discrepancies between our media definitions in the documentation and those within the database.
For M9, this situation is easily resolvable as only the documentation table seems wrong.
For LB, I am unsure how to resolve the issue.
First, in the paper from Oh et al.[1], it seems like L-Cystin
is in the medium, while in the definition from CarveMe L-Cysteine
is present. Should we add both? Or is one of them a mistake?
Second, in the paper from Oh et al.[1], some components are not added to the LB definition as the in silico definition was used for Bacillus subtilis, which had no transport system for this compound. At least, that is what the authors of the paper claim. In the same paper, they ‘mixed’ the compounds in Yeast extract and Tryptone to get the in silico LB definition. Thus, I am unsure if we should include all their listed components (see lb_coplex_def.txt) or only the ones they have in their LB medium composition.
@cb-Hades, What is your opinion on that?
[1] Oh, Y. K., Palsson, B. O., Park, S. M., Schilling, C. H., & Mahadevan, R. (2007). Genome-scale reconstruction of metabolic network in Bacillus subtilis based on high-throughput phenotyping and gene essentiality data. Journal of Biological Chemistry, 282(39), 28791-28799. https://doi.org/10.1074/jbc.M703759200
@GwennyGit Regarding Cystine/Cysteine, I found the following explaination about the oxidation of Cystein into Cystine in aqueous solutions. I would suggest putting in both.
For the LB definition, I would also suggest using everything in the medium description, as other models could potentially have exchange reactions for the skipped substances.
On the dev
branch, L-Cystathionine
was removed as it was wrongly added in the first place. In addition, L-Cystine
, nh3
, Iron, Pyridoxal
, Nicotinic acid
and a second identifier for D-Glucose were added as needed.
On the branch database-io-connection
, D-Malate
and Hydrosulfide
were removed as these substances were not assigned to any media. In addition, L-Cysteine
was changed to L-Cystine
in the RPMI
definition as L-Cystine
is contained in the definition of thermofisher for RPMI
but not L-Cysteine
.
No identifier in the BiGG database exists for Selenium.
Elemental selenium is insoluble in water and not rapidly reduced or oxidized in nature.
-> See Selenium in Drinking-water - Background document for development of WHO Guidelines for Drinking-water Quality Moreover, as Selenium is mentioned as a substance in Yeast Extract in the article from Oh et al. [11], which is used as a reference for the LB medium, we assume that the Selenium is not pure Selenium but Selenium-containing substances produced by yeast. Furthermore, the Selenium mentioned in this paper could also be from an unused Selenium-containing substance used in the yeast's growth medium for the yeast extract. Thus, according to the article from Rayman [12], the following components are Selenium containing substances in yeast extract: Substance name | From yeast | From medium | BiGG identifier(s) |
---|---|---|---|
Sodium selenite | ❌ | ✅ | na1 , slnt |
L-Selenomethionine [SeMet] | ✅ | ❌ | selmeth |
Selenite | ✅ | ❌ | slnt |
gamma-Glutamyl-Se-methylselenocysteine | ✅ | ❌ | gglusem <- same chemical formula |
L-Adenosylselenohomocysteine | ✅ | ❌ | seahcys |
Hence, to the definition of RPMI
, all these substances are added on structure-update
.
[12] Rayman, M. (2004). The use of high-selenium yeast to raise selenium status: How does it measure up? British Journal of Nutrition, 92(4), 557-573. doi:10.1079/BJN20041251
I started transferring the definition of ‘Artificial Sebum’ into an in silico definition.
For the substances ‘Olive oil’, ‘Coconut oil’ and ‘Cottonseed oil’, I used the ‘Design a diet’ tool on the VMH website. I selected each substance individually, generated the fluxes and downloaded the result. From the result, all components with zero flux were removed. For the flux generation, it was assumed that one litre of medium is used. Thus, the percentage provided in the original definition of ‘Artificial Sebum’ was multiplied by one litre. The result of this calculation was used to get the fluxes for the corresponding amount of substance.
The substance ‘Paraffin wax’ could not be mapped to any specific name, formula or database identifier. Hence, this substance is removed from the in silico definition.
A function to automatically extend the database or add missing entries would be great.
I tried to implement the functionality within the add_medium
function of the medium.py
module. However, my implementation was too basic, and I was unsure whether it would be better to have all the update functionality separately. 🤔
For now, I will use the interactive shell to add the entries for the source columns and update the substances as necessary.
I will track the commit messages displayed in the shell in this comment for future automatic functionality.
Case 1 - Update a column in a table:
UPDATE "<table_name>" SET "<column_name>" = ? WHERE "rowid" = <rowid_of_row_with_correct_match>
Case 2 - Add a new entry:
INSERT INTO "<table_name" (List of column names without default values) VALUES (?, ?)
Example:
INSERT INTO "substance" ("id", "name") VALUES (?, ?)
According to this site, the iron oxidation states depend strongly on the surrounding pH. I need to investigate that further to conclude which media should contain which iron oxidation states.
Database restructuring finished, namespace issues moved to #36 and ideas for new media have been added to #123
I will collect the ToDos from this thread here @GwennyGit maybe you can mark what you did already and on what branch :D
[x]
write function to simulate without oxygenAdd boolean to enable anaerobic growth simulation to all relevant functionsSMMUrineMP-AU [10]SMMUrineMP-AU [10][2][9]Add Urine [2]Add MP-AU [10][x] Add Basal medium (Name from paper, Renamed to: BMS23 for Basal Medium Swaney 2023)
[x] Check in
generate_insert_query
for strings invalue_string
❗Feature request for maintenance
[x] Add function to update tables/specific table entries automatically
[1] Krismer, Bernhard; Liebeke, Manuel; Janek, Daniela; Nega, Mulugeta; Rautenberg, Maren; Hornig, Gabriele et al. (2014): Nutrient Limitation Governs Staphylococcus aureus Metabolism and Niche Adaptation in the Human Nose. In: PLOS Pathogens 10 (1), e1003862. DOI: 10.1371/journal.ppat.1003862. [2] Ding T, Case KA, Omolo MA, Reiland HA, Metz ZP, Diao X, Baumler DJ. Predicting Essential Metabolic Genome Content of Niche-Specific Enterobacterial Human Pathogens during Simulation of Host Environments. PLoS One. 2016 Feb 17;11(2):e0149423. doi: 10.1371/journal.pone.0149423. PMID: 26885654; PMCID: PMC4757543. [3] https://www.thermofisher.com/de/de/home/technical-resources/media-formulation.114.html [4] Unthan, Simon, et al. "Beyond growth rate 0.6: What drives Corynebacterium glutamicum to higher growth rates in defined medium." Biotechnology and bioengineering 111.2 (2014): 359-371., Preparation protocol [5] Richard A. Nolan (1971) Amino Acids and Growth Factors in Vitamin-Free Casamino Acids, Mycologia, 63:6, 1231-1234, DOI: 10.1080/00275514.1971.12019223 [6] https://www.sigmaaldrich.com/DE/de/product/sigma/m6030, Preparation protocol [7] Machado, Daniel, et al. "Fast automated reconstruction of genome-scale metabolic models for microbial species and communities." Nucleic acids research 46.15 (2018): 7542-7553. https://carveme.readthedocs.io/en/latest/advanced.html#media-database [8] Tramontano, M., Andrejev, S., Pruteanu, M. et al. Nutritional preferences of human gut bacteria reveal their metabolic idiosyncrasies. Nat Microbiol 3, 514–522 (2018). https://doi.org/10.1038/s41564-018-0123-9 [9] Nantia Leonidou, Alina Renz, Reihaneh Mostolizadeh, and Andreas Dräger. New workflow predicts drug targets against sars-cov-2 via metabolic changes in infected cells. PLOS Computational Biology, 19(3):1–32, 03 2023. URL: https://doi.org/10.1371/journal.pcbi.1010903, doi:10.1371/journal.pcbi.1010903. [10] Sarigul, N., Korkmaz, F. & Kurultak, İ. A New Artificial Urine Protocol to Better Imitate Human Urine. Sci Rep 9, 20159 (2019). https://doi.org/10.1038/s41598-019-56693-4 [11] Oh, Y. K., Palsson, B. O., Park, S. M., Schilling, C. H., & Mahadevan, R. (2007). Genome-scale reconstruction of metabolic network in Bacillus subtilis based on high-throughput phenotyping and gene essentiality data. Journal of Biological Chemistry, 282(39), 28791-28799. https://doi.org/10.1074/jbc.M703759200 [12] Swaney MH, Nelsen A, Sandstrom S, Kalan LR. Sweat and Sebum Preferences of the Human Skin Microbiota. Microbiol Spectr. 2023 Feb 14;11(1):e0418022. doi: 10.1128/spectrum.04180-22. Epub 2023 Jan 5. PMID: 36602383; PMCID: PMC9927561