SysBioChalmers / RAVEN

The RAVEN Toolbox for genome scale model reconstruction, curation and analysis.
http://sysbiochalmers.github.io/RAVEN/
Other
98 stars 52 forks source link

SBO terms not exported properly to Excel format when they are within MIRIAM annotations #516

Closed pranasag closed 2 months ago

pranasag commented 7 months ago

Description of the issue:

Hi everyone,

I have this issue on exporting models to Excel format. Systems Biology Ontology (SBO) terms are not properly handled in the exportToExcelFormat() function, when they are part of MIRIAM annotations. These terms get exported as sbo/SBO:-1.000000e+00. The SBO terms are represented correctly in the loaded model object in RAVEN environment.

Reproducing this issue:

See a snippet of the raw SBML model to get the feeling:

      <fbc:geneProduct metaid="meta_<geneId>" sboTerm="SBO:0000243" fbc:id="<geneId>" fbc:label="<geneName>">
        <annotation>
          <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
            <rdf:Description rdf:about="#meta_<geneId>">
              <bqbiol:is>
                <rdf:Bag>
                  <rdf:li rdf:resource="http://identifiers.org/ncbiprotein/<geneId>"/>
                  <rdf:li rdf:resource="http://identifiers.org/sbo/SBO:0000243"/>
                </rdf:Bag>
              </bqbiol:is>
            </rdf:Description>
          </rdf:RDF>
        </annotation>
      </fbc:geneProduct>

If the SBO term is defined in the highest-level clause (fbc:geneProduct metaid="meta_<geneId>" sboTerm="SBO:0000243"), it's processed properly, but if it falls under annotations (<rdf:li rdf:resource="http://identifiers.org/sbo/SBO:0000243"/>) it throws sbo/SBO:-1.000000e+00 in the Excel sheet. Here, gene is taken as an example, but the same happens to reactions and metabolites.

System information

THE RAVEN TOOLBOX

Installing from location /Users/pranas/Documents/MATLAB/RAVEN Checking RAVEN release 2.8.6 You are running the latest RAVEN release Checking MATLAB release 2023b Checking system architecture maci64 Set RAVEN in MATLAB path Pass Save MATLAB path Pass Make binaries executable Pass

=== Model import and export ===

Add Java paths for Excel format Pass Checking libSBML version 5.19.0 Checking model import and export Import Excel format Pass Export Excel format Pass Import SBML format Pass Export SBML format Pass

=== Model solvers ===

Checking for LP solvers glpk Pass gurobi Pass soplex Fail cobra Fail Set RAVEN solver glpk

=== Essential binary executables ===

Checking BLAST+ Pass Checking DIAMOND Pass Checking HMMER Pass

=== Compatibility ===

Checking function uniqueness Pass

I hereby confirm that I have:

edkerk commented 2 months ago

Thank you for your patience. The problem is actually in importModel: if no sboTerm is provided in fbc object in SBML, TranslateSBML gives it the value -1, which becomes problematic when SBO:0000x is prefixed at a later stage during import. This problem did not occur if all e.g. genes lacked SBO terms, as these would have been filtered out. This is now fixed in e41c0786dff4fb862ec75246fe8b7475a9016095.

Then, another issue became apparent. Currently, importModel does not read SBO terms if specified as identifiers.org within <annotation>. Likewise, exportModel writes SBO terms into sboTerms, not in <annotation>. This is not only true for genes, but also for species and reactions. Do you have a particular reason why SBO term are specified as identifiers.org? Should this be considered instead of sboTerm? IMHO, sboTerm should then have precedence.

pranasag commented 2 months ago

Thank you for getting the initial issue fixed. My reason to store SBO terms in the MIRIAM block was initially because I found it the easiest way to do this in the Python-based tools (CBMpy/COBRApy), at least some time ago. That's now covered by cbmpy afaik.

But I agree with you that sboTerm belongs "more" to the SBML object header, and probably the way to go would be to make sure that the terms stored there are considered primary. The current behavior, as I understand, is that the terms stored in the MIRIAM block are ignored. Parsing and putting the terms missing in sboTerm but present in the MIRIAM block could be a functionality to merge the annotations (as an optional flag in importModel?), but I see somewhat low value/priority.

edkerk commented 2 months ago

Indeed, the current behaviour is to fully ignore SBO terms in the MIRIAM block.

I can easily change this to the following: (1) import SBO terms from the MIRIAM block; (2) overwrite these SBO terms if a valid sboTerm is present. So the SBML object header would still have priority.

(Done in 15f908410aee0da1d00e9fae272b0a7ec43ca850)

pranasag commented 2 months ago

I am now looking at the latest SBML specifications, and pt. 5.2.2 (p. 97) suggests these terms should be unique for a single object. Perhaps also relevant to implement a check whether # of SBO terms <= 1 (if not done already)?

edkerk commented 2 months ago

Good point, now included in PR #536