draeger-lab / refinegems

refineGEMs is a python package inteded to help with the curation of genome-scale metabolic models (GEMS).
https://refinegems.readthedocs.io/en/latest/
MIT License
10 stars 1 forks source link

Update media definitions, document and extend db #16

Closed famosab closed 3 months ago

famosab commented 2 years ago

I will collect the ToDos from this thread here @GwennyGit maybe you can mark what you did already and on what branch :D

famosab commented 2 years ago

I added Casamino Acids to the media definition based on an article from 1971 - Amino Acids and Growth Factors in Vitamin-Free Casamino Acids.

famosab commented 1 year ago

@GwennyGit maybe you can add the gut medium as soon as you get to that. But we might rethink how we note the media definitions. At the moment it is just a big csv file which works but entering it into the database we use for sboann might be more elegant?

GwennyGit commented 1 year ago

The idea of using the database sounds good. However, if someone would want to use the media definitions in another program like for example the gapfill function of CarveMe it is easier to transform the CSV file into the required format. Additionally, it might be easier to use the CSV file to add other media if the user wants to. @famosab What do you think? 🤔

famosab commented 1 year ago

I think we should move the existing media definitions into the database as well. You mentioned somewhere that access via pandas should be possible. If that works for a user that just installs refineGEMs via pip, it would be great! Maybe we can implement a function which exports the database entries to a csv medium definition. The functionality for a possible user would still be the same since they could just use a local csv as well.

GwennyGit commented 1 year ago

Yes, I mentioned that in issue #49, and I am currently working on that task.

famosab commented 1 year ago

At the moment we have both CGXII and CGXIlab in the database. I would advise to remove CGXII and replace it by the composition of CGXlab since that is the composition which is used in for the manuscript we will publish soon and CGXII is just a file I got a while ago but is not verfiied with laboratory use. We could also remove LB and M9 without oxygen or write a small function to allow for anaerobic simulation on any of the media.

GwennyGit commented 1 year ago

Removing CGXII and only keeping CGXlab of these two media is a good idea. However, I think it would also be good to describe all media in the documentation so that the user knows what it is, why the user could use which and so on.


Yeah, creating a small function to allow for anaerobic simulation on any media sounds like a great idea. Then this simulation part would not only be restricted to M9 and LB.

GwennyGit commented 1 year ago

I created so far only the basic set-up for the media definition pages. Thus the pages still need to be filled with content.

GwennyGit commented 1 year ago

Re-evaluation of SNM3 The composition of SNM3 within refineGEMs was compared to another composition used within the draeger-lab group as well as to the original wet lab composition described in 'Nutrient Limitation Governs Staphylococcus aureus Metabolism and Niche Adaptation in the Human Nose' [1]. Additionally, all compound assignments to the BiGG database were checked and the names added with the following pattern: BiGG ID [Name in wet lab description].

From the comparison the following differences were found:

  1. For Cyanocobalamine which is Vitamin B12 the SNM3 definition in refineGEMs listed adocbl which is the BiGG ID for Adenosylcobalamine. As the chemical formula is no good match for Cyanocobalamine this compound was removed from the definition. -> Replaced adocbl with cbl1 (Cob(I)alamine), cbl2 (Cob(II)alamine and b12 (Vitamin B12). The first two BiGG IDs were chosen due to having a high similarity to the chemical formula of Cyanocobalamine and being already included in the SNM3 definition from the draeger-lab group. b12 was added as Cyanocobalamine is Vitamin B12.
  2. In the draeger-lab group both fe2 and fe3 are contained in the SNM3 definition. However, the refineGEMs definition only contained fe2. Thus, fe3 was added.

In conclusion after discussing with @famosab we decided to add all analoga for each compound for all media. Hence, the addition of all possible similar compounds to Cyanocobalamine and Iron (Fe).


[1] Krismer, Bernhard; Liebeke, Manuel; Janek, Daniela; Nega, Mulugeta; Rautenberg, Maren; Hornig, Gabriele et al. (2014): Nutrient Limitation Governs Staphylococcus aureus Metabolism and Niche Adaptation in the Human Nose. In: PLOS Pathogens 10 (1), e1003862. DOI: 10.1371/journal.ppat.1003862.

famosab commented 1 year ago

Re-evaluation of RPMI The composition of the in silico RPMI medium was compared to the provider reference. This comparison yielded the following points:

famosab commented 1 year ago

Re-evaluation of M9

The M9 composition is based on the provider reference for the minimal salts:

Plus necessary additives as described here and here:

And o2 and h2o are present per standard.

GwennyGit commented 1 year ago
Addition of the defined Gut Microbiota Medium (dGMM) to the database To get all relevant BiGG IDs for the salts the following table was used: Ion Abundance BiGG ID BiGG name
Fe(II) 1 fe2 Fe2+
SO4 6 so4 Sulfate
Zn 1 zn2 Zinc
Co 1 cobalt2 Co2+
NO3 1 no3 Nitrate
Al 1 - -
K 2 k Potassium
Na 5 na1 Sodium
SeO3 1 slnt Selenite
WO4 1 tungs Tungstate
Ni 1 ni2 Nickel
Cl 3 cl Chloride
Ca 1 ca2 Calcium
Cu 1 cu2 Copper
Mn 1 mn2 Manganese
Mg 1 mg2 Magnesium
HCO3 1 hco3 Bicarbonate
MoO4 1 mobd Molybdate
H 1 h Hydrogen
PO4 1 pi Phosphate

Additionally, water was added to the definition as most salts were added with water in the laboratory version of GMM. For Resazurin, boric acid (H3BO3), Aluminium (Al), dihydrogen phosphate (H2PO4) and EDTA no BiGG IDs were found. However, dihydrogen phosphate could be separated into hydrogen (H) and phosphate (PO4) for which BiGG IDs exist. This was changed in commit (Needs to be committed❗).

GwennyGit commented 1 year ago

I added a page to the documentation to describe how one can get from a laboratory medium to an in silico one. @famosab Feel free to add adjustments.

GwennyGit commented 1 year ago

Re-evaluation of Blood The definition for Blood is missing relevant components like water or irons. From @NantiaL I got another medium definition which I compared to our current one and was also used in the paper 'New workflow predicts drug targets against SARS-CoV-2 via metabolic changes in infected cells'[2]. The definition from Nantia was obtained in collaboration with the authors of the paper 'Longitudinal Multi-omics Analyses Identify Responses of Megakaryocytes, Erythroid Cells, and Plasmablasts as Hallmarks of Severe COVID-19' [1]. Comparing the Blood medium version currently in the database to Nantia's definition revealed that from the current version Allantoin, Glucose, D-Malate and the BiGG ID nicnt for Nicotinate were absent in Nantia's medium definition. As the rest overlapped and Nantia's version contains many more components the current version of the Blood medium will be extended with Nantia's version and Allantoin as well as Glucose will be removed. The following three BiGG IDs deemed to be not valid:

-> The problem with the identifier phyQ was already fixed in commit https://github.com/draeger-lab/refinegems/commit/a49a96910e59b25063bed79b928f36bb4b83f70a. With commit https://github.com/draeger-lab/refinegems/commit/583e46c1186d4afb8caf918e7d40bf1cfd8d902e the entries for the other two identifiers gbside_hs and q10 were adjusted.


[1] Joana P. Bernardes, Neha Mishra, Florian Tran, Thomas Bahmer, Lena Best, Johanna I. Blase, Dora Bordoni, Jeanette Franzenburg, Ulf Geisen, Jonathan Josephs-Spaulding, Philipp Köhler, and Axel Künstner. Longitudinal multi-omics analyses identify responses of megakaryocytes, erythroid cells, and plasmablasts as hallmarks of severe covid-19. Immunity, 53(6):1296–1314.e9, 2020. URL: https://www.sciencedirect.com/science/article/pii/S1074761320305045, doi:https://doi.org/10.1016/j.immuni.2020.11.017. [2] Nantia Leonidou, Alina Renz, Reihaneh Mostolizadeh, and Andreas Dräger. New workflow predicts drug targets against sars-cov-2 via metabolic changes in infected cells. PLOS Computational Biology, 19(3):1–32, 03 2023. URL: https://doi.org/10.1371/journal.pcbi.1010903, doi:10.1371/journal.pcbi.1010903. [3] Mack, Stephen R.; Szuchet, Sara (1981): Synthesis of myelin glycosphingolipids by isolated oligodendrocytes in tissue culture. In: Brain Research 214 (1), S. 180–185. DOI: 10.1016/0006-8993(81)90451-0. [4] Müsken, Anne; Souady, Jamal; Dreisewerd, Klaus; Zhang, Wenlan; Distler, Ute; Peter-Katalinić, Jasna et al. (2010): Application of thin-layer chromatography/infrared matrix-assisted laser desorption/ionization orthogonal time-of-flight mass spectrometry to structural analysis of bacteria-binding glycosphingolipids selected by affinity detection. In: Rapid communications in mass spectrometry : RCM 24 (7), S. 1032–1038. DOI: 10.1002/rcm.4480. [5] Detzner, Johanna; Pohlentz, Gottfried; Müthing, Johannes (2020): Valid Presumption of Shiga Toxin-Mediated Damage of Developing Erythrocytes in EHEC-Associated Hemolytic Uremic Syndrome. In: Toxins 12 (6), S. 373. DOI: 10.3390/toxins12060373.

GwennyGit commented 1 year ago

Addition of anaerobic growth simulation To add anaerobe growth simulation the uptake rate for the exchange reaction of oxygen has to be set to 0.0. I think we can easily add the bool parameter ´anaerobic´ to the function simulate_minimum_essential of the growth module. If this parameter is set to ´True´ the EX_o2_e reaction will then be set to 0.0 within this function otherwise the current definition is used. Additionally, this parameter needs to be included in the ´config.yaml´ file and the according ´io´ function, respectively.

GwennyGit commented 1 year ago

Update on the Urine Medium The current composition only contains metabolites. Thus, the bacterial models are not growing on it. In the search for an artificial urine composition several papers with different compositions were found. However, most of the definitions exclude amino acids which were found in urine and are necessary for bacterial growth. I found one paper with tables about the composition of urine compounds detected with NMR. However, no medium definition is provided (The Human Urine Metabolome). Most papers overlap in the described urine composition (A New Artificial Urine Protocol to Better Imitate Human Urine, A simple artificial urine for the growth of urinary pathogens). The paper by T. Brooks and C. W. Keevil seems to have a good definition of a urine medium for bacterial growth testing. However, the definition includes Yeast Extract[^1] and Peptone L37 which are difficult to transfer into in siico medium definitions. For now, I will replace the Urine Medium definition with the MP-AU definition provided by Neslihan Sarigul, Filiz Korkmaz and İlhan Kurultak as it largely overlaps with the definition by T. Brooks and C. W. Keevil. The Medium name will also be changed to MP-AU.

-> In conclusion, the new definition might still need to be revised to let bacterial models grow.

[^1]: The definition of yeast extract for in silico use can be obtained from the paper by Oh, You-Kwan et al. where yeast extract needed to be defined for the LB medium. (See the Supplemental WORD document 'Complex medium composition' of the paper.)

GwennyGit commented 1 year ago

I realised that the dGMM contains no oxygen in its definition. So I looked again at the paper and found out that this medium was used under anaerobic conditions. However, in the paper, a gas mix containing carbon dioxide, nitrogen and hydrogen was added to the anaerobic chamber. As hydrogen is already part of the in silico definition in the database only carbon dioxide and nitrogen are added with the next commit. Additionally, the medium is renamed to dGMM as it is actually the defined version of GMM and not the GMM.

GwennyGit commented 1 year ago

Update on MP-AU As I realised that MP-AU also contains no oxygen I searched for a paper where this medium is used for bacteria. I found a paper by Pan, Altenried and Scheibler et al.[1] in which the MP-AU medium was used for Pseudomonas aeruginosa. In this paper, it is not mentioned that MP-AU was used under anaerobic conditions. Thus, oxygen is added to the in silico MP-AU definition with the next commit.


[1] Fei Pan, Stefanie Altenried, Subas Scheibler, Alexandre H.C. Anthis, Qun Ren, Specific capture of Pseudomonas aeruginosa for rapid detection of antimicrobial resistance in urinary tract infections, Biosensors and Bioelectronics, Volume 222, 2023, 114962, ISSN 0956-5663, https://doi.org/10.1016/j.bios.2022.114962. (https://www.sciencedirect.com/science/article/pii/S0956566322010028)

GwennyGit commented 1 year ago

The Synthetic Minimal Medium (SMM) is removed from the database.

GwennyGit commented 10 months ago

While changing the database set-up for the media tables part, I noticed discrepancies between our media definitions in the documentation and those within the database.

For M9, this situation is easily resolvable as only the documentation table seems wrong.

For LB, I am unsure how to resolve the issue. First, in the paper from Oh et al.[1], it seems like L-Cystin is in the medium, while in the definition from CarveMe L-Cysteine is present. Should we add both? Or is one of them a mistake? Second, in the paper from Oh et al.[1], some components are not added to the LB definition as the in silico definition was used for Bacillus subtilis, which had no transport system for this compound. At least, that is what the authors of the paper claim. In the same paper, they ‘mixed’ the compounds in Yeast extract and Tryptone to get the in silico LB definition. Thus, I am unsure if we should include all their listed components (see lb_coplex_def.txt) or only the ones they have in their LB medium composition. @cb-Hades, What is your opinion on that?


[1] Oh, Y. K., Palsson, B. O., Park, S. M., Schilling, C. H., & Mahadevan, R. (2007). Genome-scale reconstruction of metabolic network in Bacillus subtilis based on high-throughput phenotyping and gene essentiality data. Journal of Biological Chemistry, 282(39), 28791-28799. https://doi.org/10.1074/jbc.M703759200

cb-Hades commented 10 months ago

@GwennyGit Regarding Cystine/Cysteine, I found the following explaination about the oxidation of Cystein into Cystine in aqueous solutions. I would suggest putting in both.

For the LB definition, I would also suggest using everything in the medium description, as other models could potentially have exchange reactions for the skipped substances.

GwennyGit commented 10 months ago

On the dev branch, L-Cystathionine was removed as it was wrongly added in the first place. In addition, L-Cystine, nh3, Iron, Pyridoxal, Nicotinic acid and a second identifier for D-Glucose were added as needed.

On the branch database-io-connection, D-Malate and Hydrosulfide were removed as these substances were not assigned to any media. In addition, L-Cysteine was changed to L-Cystine in the RPMI definition as L-Cystine is contained in the definition of thermofisher for RPMI but not L-Cysteine.

GwennyGit commented 10 months ago

No identifier in the BiGG database exists for Selenium.

Elemental selenium is insoluble in water and not rapidly reduced or oxidized in nature.

-> See Selenium in Drinking-water - Background document for development of WHO Guidelines for Drinking-water Quality Moreover, as Selenium is mentioned as a substance in Yeast Extract in the article from Oh et al. [11], which is used as a reference for the LB medium, we assume that the Selenium is not pure Selenium but Selenium-containing substances produced by yeast. Furthermore, the Selenium mentioned in this paper could also be from an unused Selenium-containing substance used in the yeast's growth medium for the yeast extract. Thus, according to the article from Rayman [12], the following components are Selenium containing substances in yeast extract: Substance name From yeast From medium BiGG identifier(s)
Sodium selenite na1, slnt
L-Selenomethionine [SeMet] selmeth
Selenite slnt
gamma-Glutamyl-Se-methylselenocysteine gglusem <- same chemical formula
L-Adenosylselenohomocysteine seahcys

Hence, to the definition of RPMI, all these substances are added on structure-update.


[12] Rayman, M. (2004). The use of high-selenium yeast to raise selenium status: How does it measure up? British Journal of Nutrition, 92(4), 557-573. doi:10.1079/BJN20041251

GwennyGit commented 9 months ago

I started transferring the definition of ‘Artificial Sebum’ into an in silico definition.

For the substances ‘Olive oil’, ‘Coconut oil’ and ‘Cottonseed oil’, I used the ‘Design a diet’ tool on the VMH website. I selected each substance individually, generated the fluxes and downloaded the result. From the result, all components with zero flux were removed. For the flux generation, it was assumed that one litre of medium is used. Thus, the percentage provided in the original definition of ‘Artificial Sebum’ was multiplied by one litre. The result of this calculation was used to get the fluxes for the corresponding amount of substance.

The substance ‘Paraffin wax’ could not be mapped to any specific name, formula or database identifier. Hence, this substance is removed from the in silico definition.

GwennyGit commented 8 months ago

A function to automatically extend the database or add missing entries would be great. I tried to implement the functionality within the add_medium function of the medium.py module. However, my implementation was too basic, and I was unsure whether it would be better to have all the update functionality separately. 🤔 For now, I will use the interactive shell to add the entries for the source columns and update the substances as necessary. I will track the commit messages displayed in the shell in this comment for future automatic functionality.


Case 1 - Update a column in a table:

UPDATE "<table_name>" SET "<column_name>" = ? WHERE "rowid" = <rowid_of_row_with_correct_match>

Case 2 - Add a new entry:

INSERT INTO "<table_name" (List of column names without default values) VALUES (?, ?)

Example:

INSERT INTO "substance" ("id", "name") VALUES (?, ?)
GwennyGit commented 8 months ago

According to this site, the iron oxidation states depend strongly on the surrounding pH. I need to investigate that further to conclude which media should contain which iron oxidation states.

cb-Hades commented 3 months ago

Database restructuring finished, namespace issues moved to #36 and ideas for new media have been added to #123