SysBioChalmers / RAVEN

The RAVEN Toolbox for genome scale model reconstruction, curation and analysis.
http://sysbiochalmers.github.io/RAVEN/
Other
98 stars 52 forks source link

KEGG subSystems field not as cell array #552

Closed Rachita-Kumar closed 2 days ago

Rachita-Kumar commented 5 days ago

Description of the issue:

Hello, I am trying to reconstruct genome-scale metabolic models using RAVEN. I tried the following approaches:

  1. Based on KEGG organism code as outlined here
  2. Using pre-trained HMMs as outlined here

In both methods I tried to reconstruct the model for Saccharomyces cerevisiae (sce). However, I encountered issues when I tried to export the model and further when I checked the model structure I received the following error:

Error using dispEM (line 49)
The "subSystems" field must be a cell array
Error in checkModelStruct (line 164)
            dispEM(EM,throwErrors);

The error I receive when I export the model using exportModel is as follows:

WARNING: The "subSystems" field must be a cell array

WARNING: No objective function found. This might be intended, but results in FBCv2 non-compliant SBML file when exported

WARNING: The composition for the following metabolites could not be parsed:
    C00856
    C00868
    C01977
    C01978
    C02967
    C04157
    C04432
    C04728
    C11478
    ...and 14 more

WARNING: The following InChI strings are associated to more than one unique metabolite name:
    1S/C10H20O7P2/c1-9(2)5-4-6-10(3)7-8-16-19(14,15)17-18(11,12)13/h5,7H,4,6,8H2,1-3H3,(H,14,15)(H2,11,12,13)/b10-7+

I would really appreciate your help with resolving this issue. Thanks in advance, Rachita

Reproducing this issue:

model=getKEGGModelForOrganism('sce','sce.fa','euk90_kegg105','output',false,false,false,false,10^-30,0.8,0.3,-1);
checkModelStruct(model)

System information

Installation type                    Advanced (via git)
Installing from location             /cluster/scratch/rakumar/RAVEN
Checking RAVEN release               2.9.2
   You are running the latest RAVEN release
Checking MATLAB release              2023b
Checking system architecture         glnxa64
Set RAVEN in MATLAB path             Pass
Save MATLAB path                     Warning: Unable to save path to file '/cluster/software/commercial/matlab/R2023b/toolbox/local/pathdef.m'. You can save your path to a different location by calling SAVEPATH with an input argument that specifies the full path. For MATLAB to use that path in future sessions, save the path to 'pathdef.m' in your MATLAB startup folder.
=== Model import and export ===
Add Java paths for Excel format      Pass
Checking libSBML version             5.20.2
Checking model import and export
   Import Excel format                Fail
   Excel import/export is incompatible with MATLAB Text Analytics Toolbox.
   Further instructions => https://github.com/SysBioChalmers/RAVEN/issues/55#issuecomment-1514369299
   Export Excel format                Fail
   Import SBML format                 Pass
   Export SBML format                 Pass
   MATLAB Text Analytics Toolbox found. This should be uninstalled if you want to read/write Excel files. See RAVEN GitHub Issues page for instructions.

=== Model solvers ===
Checking for LP solvers
   glpk                               Pass
   gurobi                             Pass
   scip                               Fail
   cobra                              Fail
Set RAVEN solver                     gurobi

=== Essential binary executables ===
Checking BLAST+                      Pass
Checking DIAMOND                     Pass
Checking HMMER                       Pass

=== Compatibility ===
Checking function uniqueness         Pass

*** checkInstallation complete ***

ans = '2.9.2'

I hereby confirm that I have:

edkerk commented 3 days ago

It appears that some reactions from KEGG are not annotated to any pathways, which results in empty subSystems with incorrect formatting ([] instead of {{''}}). checkModelStruct throws this as an error, while exportModel throws a warning but continues writing the SBML file anyway (note the difference between error and warning, the latter not being critical). Importing the SBML model solves this problem, as the subSystems field is constructed anew. Another solution is the following workaround:

emptySubSystem = cellfun(@isempty, model.subSystems);
model.subSystems(emptySubSystem) = {{''}};

The getKEGGModelForOrganism function will be modified to avoid this issue in future releases.

Rachita-Kumar commented 3 days ago

Thank you for your response, I really appreciate it. I was wondering if you could also help me resolve the warning regarding the objective, I followed your suggestion mentioned in a previous discussion https://github.com/SysBioChalmers/RAVEN/issues/423#issuecomment-1156125961 however I received the following warning

model=setParam(model,'obj','Growth',1)
WARNING: Reaction Growth is not present in the reaction list

I was trying to locate the biomass reaction to set as the objective, however could not identify it.

Further, I get the following error when I try to use a different database and was wondering if you could help me resolve it

model=getKEGGModelForOrganism('sce','sce.fa','euk100_kegg94','output',false,false,false,false,10^-30,0.8,0.3,-1);
*** The model reconstruction from KEGG based on the protein homology search against KEGG Orthology specific HMMs ***

Error using getKEGGModelForOrganism
Pre-trained HMMs set is not recognised. If you want download RAVEN provided sets, it should match any of the following: euk90_kegg105 or prok90_kegg105

I am following tutorial 5 as described here

Thank you for your help and suggestions, Rachita

edkerk commented 2 days ago

With setParam you can set an existing reaction as objective function, it seems like your model does not have a reaction that has the reaction identifier 'Growth' (see model.rxns). Indeed, models generated from KEGG do not have a biomass reaction, this should be manually defined. For sake of being able to import the model in other software (the problem that was raised in #426), you can take any reaction as objective function (for instance the first one), but be aware that you should likely change it before doing any FBA simulations, otherwise the results likely are not very informative.

If you want to define a biomass equation, I suggest you look other model publications to see how they did this; refer to the RAVEN 1 paper; or explore tools like BOFdat.

Regarding your second question, the error message clarifies that the HMM set should be euk90_kegg105 or prok90_kegg105. You used euk100_kegg94, but this was the version used before RAVEN 2.6.0.