SysBioChalmers / yeast-GEM

The consensus GEM for Saccharomyces cerevisiae
http://sysbiochalmers.github.io/yeast-GEM/
Creative Commons Attribution 4.0 International
95 stars 46 forks source link

bug: inconsistency in metabolite identifiers and names between .yml, .txt and .xml files. #281

Closed edkerk closed 2 years ago

edkerk commented 2 years ago

Description of the issue:

In the .xml file and when loaded in MATLAB, metabolite identifiers and names are suffixed with compartment information. This is absent in the .yml, .txt and .xlsx files.

Expected feature/value/output:

Metabolite identifiers and names should be identical in all model files (*), as metabolite compartments are already included in the compartment field in SBML/yml/MATLAB/XLSX, it does not need to be suffixed.

s_0002 should stand for cytoplasmic 1,3-beta-D-glucan in all model files.

(*) prefixes in SBML files is still a discussion point.

Current feature/value/output:

The SBML file and when loaded with COBRA Toolbox:

s_0002[c] stands for cytoplasmic 1,3-beta-D-glucan [cytoplasm]

while YML, TXT and XLSX follow the expected output. COBRA toolbox forces the use of these suffices, but RAVEN toolbox does not. YML, TXT and XLSX files are written by RAVEN instead of COBRA toolbox, and ravenCobraWrapper removes those suffixes when converting COBRA to RAVEN.

The suffixes were introduced in the IDs for no apparent reason with version 7.8 (script).

The square brackets are also problematic as they are not allowed in SBML identifiers, so it removing the suffixes would also avoid changing these non-valid identifiers when exporting to SBML.

Solution

This might break backwards compatibility of exisiting (personal) scripts, so best to apply this to a major release (e.g. 9.0.0).

I hereby confirm that I have:

edkerk commented 2 years ago

Dropping the use of COBRA in saveYeastModel and loadYeastModel and instead use RAVEN's function for all file formats would also resolve this.