LCSB-BioCore / SBML.jl

Julia interface to the Systems Biology Markup Language (SBML) library
https://lcsb-biocore.github.io/SBML.jl/stable
Apache License 2.0
28 stars 12 forks source link

SBML.jl and SciML integration #31

Closed anandijain closed 3 years ago

anandijain commented 3 years ago

Hello, I've been working with @paulflang on building an SBML parser that would convert SBML ODEs into ModelingToolkit.jl form, to be used with the SciML ecosystem.

Here is the repo https://github.com/anandijain/SBML.jl/ It would not make sense for us to double up work.

If any maintainers are on slack, we have a chat for building biological model importers (SBML, CellML, bionetgen) into SciML. It'd be good to see how we can help y'all and still get tight integration with SciML.

paulflang commented 3 years ago

Thanks @anandijain and hello also from my side!

laurentheirendt commented 3 years ago

Hello @anandijain and @paulflang 😄

Good to see your efforts! The main reason behind using SBML_jll and libSBML is that it is a well-established library and it was the most straightforward way to interface to that library from Julia without having to resort to writing a parser from scratch.

For our use case, @exaexa and I thought it would not make too much sense to implement a full Julia parser from scratch - of course, this is always welcome, but I doubt we would see any major benefits performance wise.

What I could imagine is to actually support both, a native SBML.jl package and a libSBML.jl package (that would be this one). We should also have a chat with @skeating or @fbergmann from the SBML team to get their input 👍🏼

exaexa commented 3 years ago

I originally thought that writing the XML parser from below would be viable too, but given all the complexity in SBML specification this was not the way to go. Technically, it would be best to have the SBML schema published in a machine-accessible format and automatically generate the XML parser from it -- there have been certainly some efforts on writing the schema, but I'm not at all sure how that's going.

paulflang commented 3 years ago

Great suggestion to hear Sarah Keating's and Frank Bergmann’s thoughts. Would also be interesting to hear were we stand with the schema.

Since you are more interested in FBA and we are more interested in ODEs and SSA, what we do with the SBML model once it is imported into Julia is different. But I see you have a Model type that is somewhat similar to our SbmlModel type. I could imagine that defining a common type would make sense (no matter if we use Julia native parsers or not, and no matter what we do downstream with the model). But I am not familiar with how FBA models are specified in SBML. So I will have to read up on that, as well as having a look at your code.

exaexa commented 3 years ago

But I see you have a Model type that is somewhat similar to our SbmlModel type. I could imagine that defining a common type would make sense (no matter if we use Julia native parsers or not, and no matter what we do downstream with the model).

If you can list "features" extracted from SBML that you're missing, we'll be happy to add them. Refer to libSBML manual for extractable items, ideally this API guide: http://sbml.org/Software/libSBML/docs/c-api/annotated.html

But I am not familiar with how FBA models are specified in SBML.

Well not really, SBML is a descriptive rather than computation-oriented format. Despite of that it's pretty easy to get the information relevant to FBA, see e.g. here: https://lcsb-biocore.github.io/SBML.jl/stable/#SBML.getS-Tuple{Model}

anandijain commented 3 years ago

Foreword: I am a noobie, so any corrections are appreciated. This may provide a bit of context regarding why I am interested in integrating with SciML:

Here's a more complete example of the API we are aiming for: CellMLToolkit.jl. The main functionality is lowering an XML document into the ModelingToolkit symbolic form (ODE, DAE, SDE, PDE, etc). We do this using a native Julia MathML parser. I'm not sure how familiar you all are with the SciML ecosystem, but an overview of MTK is found here.

The value is being able to do a host of analyses, parameter estimation, global sensitivity analysis, etc. These tools are particularly nice in SciML because the DE Solvers are differentiable, allowing for productive use of multiple dispatch. See @ChrisRackauckas blog post on other extensions of differentiable programming in Julia here. I am also interested in hierarchical SBML models and model composition, which is supported by MTK's acausal component modeling system.

I'm not that familiar with FBA, but from my skimming this primer article, it seems largely based on constraint programming. There are plans to eventually support inequality constraints on MTK parameters, but we do not have this yet. FYI.

In terms of constructive ways that I can contribute to your efforts, I could help in testing infrastructure to start with. In CellMLToolkit, we have been tracking how much coverage we get on the entire CellML model repository. issue tracking progress. We have been building similar tooling for testing our SBML package on all of the SBML models on Auckland's Biomodels page and the SBML test-suite.

I've written some scripts to download all of the BIOMD SBML files here. I think it might be useful for you all to also track how much coverage you are getting over these databases.

If this is something you're interested in I'm happy to make a PR. I'm thinking of essentially testing loadmodels.jl on all 2216 BIOMD models found here

and yes, @exaexa I agree, so far my brief experiences with SBML are demonstrating that the schema is quite ad-hoc or requires meticulous parsing. CellML seems to have more constraints when it comes to defining a model, almost always just defines an ODE in MathML.

exaexa commented 3 years ago

@anandijain thanks for explanation! In separate points:

Re SBML, we already had a few discussions about the design rationale there. The XML schema design is mirroring a java-ish class structure, which has both advantages and disadvantages, and becomes slightly uncomfortable for non-OOP tools. It is, for sure, very useful for annotation/curation efforts. Even for programming, it's still much better than e.g. many other java-based RDF-data exchange tools :]

anandijain commented 3 years ago

If you can show a document with MathML contained + some rough illustration how the result shold look like

Biomodels lists this breakdown of the distribution of Modeling Approaches for their SBML models:

 Ordinary differential equation model (975)
 Constraint-based model (163)
 Logical model (20)
 Petri net (5)

So we are looking at parsing that first category mainly. Here is a model that contains MathML that we want to lower into MTK form: https://www.ebi.ac.uk/biomodels/BIOMD0000000796.

Edit: this is just a short response that I can follow up on later

anandijain commented 3 years ago

I've made a repo for downloading the BioModels SBML dataset (2216 models) here https://github.com/anandijain/SBMLBioModelsRepository.jl.

I'm not sure how much use this will be to you all. I will start doing tests using your package soon for a coverage tracker.

skeating commented 3 years ago

Hi

This is probably a conversation on its own; but I'll add it here since it was first asked here.

There are schemas for SBML.

For Level 3 we use RNG details here: http://sbml.org/RNG_validation

For levels 1 & 2 there are xsd schemas with each specification.

Just to point (although I know I am biased !) that using libSBML as the parser adds much more validation that creating a parser direct from a schema. libSBML validation checks semantics as well as syntax.

Sarah

anandijain commented 3 years ago

Thanks,

I'm coming around to using libSBML as our parser. If we are to go this route though, it'd be great if someone could explain how we might get the kineticLaws out in order to lower to ModelingToolkit.ReactionSystem and define the differential equations. ReactionSystem docs

fbergmann commented 3 years ago

LibSBML does include a converter, that can take out the reactions and instead add the rateRule objects to the model. Maybe that would be helpful for you:

https://github.com/sbmlteam/libsbml/blob/development/examples/c%2B%2B/convertReactions.cpp

What it does, is basically to take the speed of the reaction (the formula in the kinetic law) and for substrates multiplies it with the negative of the stoichiometry and for products it will be multiplied with the stoichiometry and keep this in a map fir each reactant / product of all reactions. Then for each species in that map a rate rule (ODE) is created that sums all of those temporary results for each species up.

paulflang commented 3 years ago

Thank you very much for pointing out the advantages of using LibSBML, @skeating. We’ve used the Python version to convert SBML to ModelingToolkit in SbmlInterface and to convert PEtab optimisation problems to Julia JuMP in SBML2Julia and now wanted to move away from the Python dependency with anandijain/SBML.jl. Since this repository here nicely shows how this ca be done with the cpp version of LibSBML and @anandijain is fine with that, I think this is the way to go.

@fbergmann: The kineticLaw to rateRule conversion may indeed become useful for converting SBML directly to ModelingToolkit.ODESystems. At the moment however, we are converting SBML to ModelingToolkit.ReactionSystems (for later conversion to ODESystems with ModelingToolkit's convert() method), for which the kineticLaws are fine, with two exceptions:

paulflang commented 3 years ago

During today's meeting we've sorted out the first issue with kineticLaws. However the second point still remains. Does LibSBML have such an intensive to extensive unit converter?

fbergmann commented 3 years ago

It does not currently have such a converter, we always considered it a matter to the tool authors to write the equations as they see fit.

However, converters can be added at runtime by users of the library. So you could write one that looks at each species hasOnlySubstanceUnits (hSOU) flag, and the spatialDimensions of the compartment it is in. (if it is hOSU or spatialDimensions=0 then it is already in amounts). All others would have to be replaced in occurrences by multiplying with the respective compartment.

best Frank

On Thu, Apr 8, 2021 at 11:38 PM Paul Lang @.***> wrote:

During today's meeting we've sorted out the first issue with kineticLaws. However the second point still remains. Does LibSBML have such an intensive to extensive unit converter?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LCSB-BioCore/SBML.jl/issues/31#issuecomment-816232753, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHHWQ3TU6V4IA7OE7VASNTTHYO3JANCNFSM42FUSZLA .

paulflang commented 3 years ago

Thanks, @fbergmann ! But I thought if hOSU=false, it means that species have to be plugged in (e.g. kineticLaws) as concentrations rather than amounts. So a species that has initialAmount instead of initialConcentration would have to be divided by the compartment size before plugging it into a kineticLaw. Or what is wrong in this thinking?

Do you know of an example where poeple have created custom converters for C++ LibSBML. I do not know C++, but perhaps one of my colleagues does. Otherwise I would have to handle this further downstream in Julia.

exaexa commented 3 years ago

Closing this, the work is either done or will continue in #40.