Open kcorreia opened 5 years ago
This does indeed look like a bug. However, including subsystem information can be done in a much better and more standardized way. Maybe the following approach can solve the specific problem for now, before the BiGG validator can be fixed.
The groups
package for SBML is useful to define collections of arbitrary model components. Most models in BiGG use it to define subsystems as a group of reaction. Here is an example from the e_coli_core
model:
<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" fbc:required="false" groups:required="false" level="3" version="1" ... xmlns:fbc="http://www.sbml.org/sbml/level3/version1/fbc/version2" xmlns:groups="http://www.sbml.org/sbml/level3/version1/groups/version1">
...
<model ...>
<groups:listOfGroups xmlns:groups="http://www.sbml.org/sbml/level3/version1/groups/version1">
<groups:group groups:id="g1" groups:kind="partonomy" groups:name="Pyruvate Metabolism" sboTerm="SBO:0000633">
<groups:listOfMembers>
<groups:member groups:idRef="R_ACALD" />
<groups:member groups:idRef="R_ACKr" />
...
</groups:listOfMembers>
</groups:group>
...
<listOfReactions>
<reaction id="R_ACALD" ... name="Acetaldehyde dehydrogenase (acetylating)"
reversible="true" sboTerm="SBO:0000375">
<annotation>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">
<rdf:Description rdf:about="#R_ACALD">
<bqbiol:is>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/bigg.reaction/ACALD" />
<rdf:li rdf:resource="http://identifiers.org/biocyc/META:ACETALD-DEHYDROG-RXN" />
...
</rdf:Bag>
</rdf:Description>
</annotation>
...
</reaction>
...
</model>
</sbml>
So, as you can see, there is a group with ID g1
and the name Pyruvate Metabolism
that contains as members several reaction IDs. Instead of writing an unstructured note
entry into the reaction
element, it is sufficient to define these groups that link to the reactions via their IDs.
Please note that the notes
element in SBML is intended to be used for storing human-readable description text that explains choices or other important aspects to users. It is not intended to store computer code or to be algorithmically parsed. In particular, whatever goes to the notes
should not be mandatory for a model to compile or simulate. In contrast, annotation
elements or the content of groups
is informative to computer processing and therefore the preferred way of storing such information.
I hope this helps.
Just to comment on this.
You can load legacy models with SUBSYSTEMS
in cobrapy and export the models with groups
information. This could perform the conversion you are interested in
I.e.
import cobra
from cobra.io import read_sbml_model, write_sbml_model
model = read_sbml_model("my_model_with_subsystems.xml")
write_sbml_model(model, "my_model_with_groups.xml")
Code not tested. (disclaimer: some of the notes information could get lost, please open an issue on https://github.com/opencobra/cobrapy/issues if you should have any issues)
I included links to protein complexes that carry out reactions. For example: http://identifiers.org/complexportal/CPX-1664
See below for the status from SBML/BiGG validation for files with and without notes for geneProductAssociation:
Snippet that causes problems:
XML file with notes in geneProductAssociation:
XML file without notes in geneProductAssociation: