SBRG / bigg_models

The BiGG Models website server
http://bigg.ucsd.edu
Other
77 stars 18 forks source link

subsystem information missing #282

Closed phantomas1234 closed 6 years ago

phantomas1234 commented 6 years ago

Description of the issue

No subsystem info is included in the SBML files provided by bigg. JSON seems to be fine.

Page

http://bigg.ucsd.edu/models/iML1515

Browser

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36

zakandrewking commented 6 years ago

Thanks for noticing this

@draeger @mephenor Have we tried to fix this? Can we store subsystem labels in SBML like COBRApy does (rather that more complex approaches discussed previously in #145)?

zak

draeger commented 6 years ago

We are putting this on our TODO list. Please leave this item open until fixed. Thanks!

mephenor commented 6 years ago

Subsystem information is actually present in the SBML file as per point a) of the last comment in #145 - a group for each subsystem with reactions as members.

What is missing, is some kind of pointer from the actual reaction to the subsystem. There is code present in ModelPolisher that links reactions to their respective subsystem, but only as a userObject, so its not written to the file.

Should this be handled analogously to COBRApy's io.sbml, where it is put into the reaction notes or is there another way to link the information?

draeger commented 6 years ago

Indeed, the SBML file has all the subsystem information. It was decided to use the groups package to represent subsystems in SBML. So, for every subsystem, a group of reactions is created that has each reaction within the subsystem as member. Using this system, one reaction can actually be member of multiple subsystems, which is biologically more realistic, because there is usually an overlap of roles in metabolism. However, as @mephenor pointed out, you need to traverse all groups in order to find out to which subsystem a particular reaction belongs. An efficient way of working with this representation is creating a hash data structure upon loading the model that links every reaction id to the identifiers of each group, i.e., the subsystems, in which it participates. Creating such a hash can be done in linear time in the number of groups, which should be singificantly fewer than the number of reactions. Next, lookups can be performed in constant time.