LCSB-BioCore / COBREXA.jl

Constraint-Based Reconstruction and EXascale Analysis
https://lcsb-biocore.github.io/COBREXA.jl/
Apache License 2.0
43 stars 9 forks source link

inconsistent parsing of subsystems #803

Open mihai-sysbio opened 10 months ago

mihai-sysbio commented 10 months ago

Minimal code example to reproduce the problem

It looks like for some file formats the subsystem field might not be parsed. I've spotted this from two different models in SBML format, so I checked with the core E.coli.

using COBREXA
!isfile("e_coli_core.mat") && download("http://bigg.ucsd.edu/static/models/e_coli_core.mat", "e_coli_core.mat");
!isfile("e_coli_core.json") && download("http://bigg.ucsd.edu/static/models/e_coli_core.json", "e_coli_core.json");
!isfile("e_coli_core.xml") && download("http://bigg.ucsd.edu/static/models/e_coli_core.xml", "e_coli_core.xml");
json = load_model(StandardModel, "e_coli_core.json")
xml = load_model(StandardModel, "e_coli_core.xml")
mat = load_model(StandardModel, "e_coli_core.mat")
foreach(model -> println(length(filter(!isnothing, map(r -> reaction_subsystem(model, r), reactions(model))))), [json, xml, mat])

Expected result

I would expect the same number of subsystems present in the model:

95
95
95

Actual behavior

Only the JSON format returns 95 entries, for the others it's 0:

95
0
0
exaexa commented 10 months ago

Hi,

good point, this is a problem. Unfortunately I don't see how to fix this for the SBML models (SBML doesn't specify subsystems, maybe there would be a way to parse them from MIRIAM-style annotations or so?). But the .mat format indeed contains the subsystems directly in field subSystem. We will add the functionality.

Extra TODO: mirror the resulting patch to here: https://github.com/COBREXA/MATFBCModels.jl (we're slowly flipping&reorganizing everything over to the separate github org)

Thanks for the report! -mk

exaexa commented 10 months ago

@stelmo btw should we add subsystems to AbstractFBC interface? I recall we talked about that and it might be the case that I killed the idea with some argument that I don't recall anymore :sweat_smile: likely about the subsystems not being proper annotations and reactions not being able to have multiple subsystems or so.... :sweat_smile: :sweat_drops:

EDIT: my emojis don't seem to work, weird.

mihai-sysbio commented 10 months ago

how to fix this for the SBML models (SBML doesn't specify subsystems, maybe there would be a way to parse them from MIRIAM-style annotations or so?)

While the subsystem concept does not have the same type of encoding as the compartment, it seems it iss captured in the SBML file via <groups:group sboTerm="SBO:0000633".

mihai-sysbio commented 10 months ago

A bit of a random idea, have you considered introducing tests to see if models from trusted sourced imported in different formats yield the same StandardModel?

exaexa commented 10 months ago

how to fix this for the SBML models (SBML doesn't specify subsystems, maybe there would be a way to parse them from MIRIAM-style annotations or so?)

While the subsystem concept does not have the same type of encoding as the compartment, it seems it iss captured in the SBML file via <groups:group sboTerm="SBO:0000633".

Ah nice, is there a known model with this? (I don't see it in the ec core).

A bit of a random idea, have you considered introducing tests to see if models from trusted sourced imported in different formats yield the same StandardModel?

Well, yes, and it unfortunately failed quite hard. Maybe we should retry now with the AbstractFBC interface in place which defines a much less opinionated (and thus more realizable) standard...

mihai-sysbio commented 10 months ago

Ah nice, is there a known model with this? (I don't see it in the ec core).

I actually found the SBO term by looking for glycolysis in Human-GEM. I would expect this to be present in yeast-GEM too. The SBO number didn't mean much to me, but seeing that it's about "subsystem group", things start to make sense.

exaexa commented 10 months ago

Uuuh good I see it in YeastGEM now. That sounds very vital, thanks!

For the SBML import/export, can we assume that

exaexa commented 10 months ago

tracking: SBML.jl support here: https://github.com/LCSB-BioCore/SBML.jl/pull/260

mihai-sysbio commented 10 months ago

For the SBML import/export, can we assume that

* only the groups with the SBO 633 are subsystems? (I'd say we can safely ignore the others)

Yes, this is the SBO term for subsystem https://identifiers.org/SBO:0000633

* there aren't going to be overlaps of groups? (I'd say no here, eventually someone will do this. :D )

Yeast-GEM has shifted from one-many to one-one when it comes to the reaction-subsystem annotation https://github.com/SysBioChalmers/yeast-GEM/pull/307