SciML / CellMLToolkit.jl

CellMLToolkit.jl is a Julia library that connects CellML models to the Scientific Julia ecosystem.
https://docs.sciml.ai/CellMLToolkit/stable/
Other
62 stars 15 forks source link

Generalizing XML parsing to SBML #15

Closed anandijain closed 3 years ago

anandijain commented 3 years ago

Hello @shahriariravanian, I'm working with @paulflang on SbmlInterface and he sent me the following to open here below:

Similar to the CellML to MTK conversion done by CellMLToolkit.jl, we have been trying to convert SBML to MTK with SbmlInterface.jl . Unfortunately, SbmlInterface.jl has a Python dependency. @anandijain and I are now thinking to write SbmlToolkit.jl and are reaching out to ask for advice on this project. Are you aware of any efforts in that direction where we could contribute to? If not, is there a particular reason why there are interfaces to CellML and BioNetGen but not to SBML as probably most common standard? Are there particular challenges with SBML compared to CellML for instance? I think we can reuse a lot of the code from CellMLToolkit.jl. But from looking at a CellML file in your example, it seems that CellML specifies the ODEs of the system, whereas SBML specifies the reactions. If we were to go forward with this, we would create an SbmlModel type that is very similar to CellModel, except that it contains the rxs::Array{ModelingToolkit.Reaction}. We can then use ModelingToolkit.ReactionSystem() and convert() to populate the eqs field. But it would be good to have the rxs field to interface with Julia's stochastic simulation algorithms. In addition, SBML is clear in what is an initial condition u0 and parameter p. So we can add the additional field pars to SbmlModel.

My thoughts are that something like BioMLParsers.jl that abstracts away dealing with the AST and whatnot and gives us process_doc or read for .xml, .sbml', and.cellml` and checks the headers of the file to see how to handle it. Since I am not that familiar with the structure of the markup languages for each, this really comes down to determining if they are similar enough to generalize to both. If you have expertise here it would be greatly appreciated.

Please let us know your thoughts.

paulflang commented 3 years ago

Thanks a lot for establishing the connection, @anandijain . I like the idea of unifying this to BioMLParsers.jl. I don't think I have time to go beyond an SBML parser, but it is good to set up the framework others can hook into. Although I have to say that I do not know anything about other XML based model specifications.

ChrisRackauckas commented 3 years ago

Is there really all that much that can be pulled from the two which isn't just the ModelingToolkit common portion already? I don't quite see what's the higher abstraction that helps.

paulflang commented 3 years ago

I think a large fraction of the XML parsing functionality in CellMLToolkit can be reused for parsing SBML files, too. So from that perspective unifying to BioMLparsers.jl or BioMLToolkit.jl or whatever we want to call it seems reasonable to me. But perhaps for the time being we can just copy all the code from CellMLToolkit, adapt is to create SbmlToolkit and think of merging the two later. How does that sound?

ChrisRackauckas commented 3 years ago

I think a large fraction of the XML parsing functionality in CellMLToolkit can be reused for parsing SBML files, too

That's just https://github.com/JuliaIO/LightXML.jl

paulflang commented 3 years ago

Well, I might be wrong, but I think there are still several functions in https://github.com/SciML/CellMLToolkit.jl/blob/master/src/CellMLToolkit.jl I could reuse.

ChrisRackauckas commented 3 years ago

I think that's a good starting point, just to see how much code reuse there really is.

BioMLParsers.jl

I would assume BioMLParsers would also take https://github.com/isaacsas/ReactionNetworkImporters.jl

paulflang commented 3 years ago

Well, BioNetGenLangueage language is not XML-based, so would have to go under a different umbrella. I am more thinking of NeuroML and perhaps SED-ML. But I know nothing about them. In any case, I can see your point how higher abstraction is questionable. It is difficult to decide which languages would belong together. And from a user perspective the most straightforward think is probably that you have one package per biological modelling language.

paulflang commented 3 years ago

I am just curious. Why is SBML not been done yet. It is the most used language afaik. Are there particular challenges that I fail to see at the moment?

anandijain commented 3 years ago

Ya, @ChrisRackauckas the reason I didn't include ReactionNetworkImporters is it's not a Markup Language I think. But we can definitely have it wrap so that read is as easy as possible and doesn't require specification from user. I don't think that would fall under BioML though.

ChrisRackauckas commented 3 years ago

I am just curious. Why is SBML not been done yet. It is the most used language afaik. Are there particular challenges that I fail to see at the moment?

There is no concerted effort here: people just added what they needed.

paulflang commented 3 years ago

OK. I see. Thanks. I just had the thought that the way CellMLToolkit parses MathML (which is used by CellML and SBML) could become an independent Julia package.

But for now it is probably best to go with CellMLToolkit as template and create a SbmlToolkit.jl package.

paulflang commented 3 years ago

@ChrisRackauckas : so when we create a new repo SbmlToolkit.jl, can I host it on my personal GitHub and put it into SciML when it is sufficiently good and complete, or shall we start right away under SciML?

anandijain commented 3 years ago

No we'll move it after lets just build it now on your account.

ChrisRackauckas commented 3 years ago

Yeah it's easy to move repos so just get started.

shahriariravanian commented 3 years ago

Thank you everyone for starting this project. It is a great idea to combine different model import codes into SciML.

I think the part directly common to both CellML and SBML (and probably many other XML-based markups) is the MathML part, i.e., converting a MathML formatted XML to an MTK expression. We can make a separate MathML parser used by both. The higher-order abstractions in CellML and SBML are different but comparable.

While both CellML and SBML are rather similar (and their main conference is shared and is named COMBINE), they are practically used for different applications. My background is electrophysiology and CellML was the natural one to use. SBML can be used for electrophysiology but, in practice, is used mostly for chemical-reactions.

The goal is to convert a model XML, through MathML, to an MTK structure. The main question we are facing is whether we need a middle level (like CellModel) to keep track of the intermediate semantic information (what is a variable, what is a compartment, ...) or go directly to MTK.

The other issue is that while CellML models generally translate to an ODE system, the SBML models may or may not (some of them are discrete). What should we do with those?

paulflang commented 3 years ago

Hi @shahriariravanian . Thanks for sharing your thoughts. An independent MathML parser sounds very useful.

But what I do not understand is that SBML models may be discrete. To my knowledge SBML just describes the reaction system, but is agnostic wrt the algorithm (ODE simulator, SSA) to simulate the network. Am I wrong here? If not, I think this is best captured by converting SBML to the existing MTK.ReactionSystem type. The user can then decide on whether a deterministic/continuous or stochastic/discrete simulation shall be run.

The only problem that I see, is that an MTK.ReactionSystem does not hold information about compartment volume, which is needed to interconvert between the microscopic rates used in stochastic simulations vs. the macroscopic rates used in ODE simulations. Does anyone know if/how this is solved?

anandijain commented 3 years ago

Answer to part of your question:

https://github.com/SciML/ModelingToolkit.jl/issues/571

Open issue for differencing in MTK

ChrisRackauckas commented 3 years ago

I think ReactionSystem makes sense for SBML and ODESystem for CellML. I don't think the DiscreteSystem will be needed for SBML, as instead what it really wants is ReactionSystem -> JumpSystem lowering (which exists).

paulflang commented 3 years ago

I agree how ReactionSystem makes sense for SBML. But there is the limitation that several SBML models cannot be captured with ReactionSystem. Most importantly, I do not think that a model where a species translocates between two compartments of different size can be specified when u0 is given as concentration rather than number (which implies ODE-based simulation rather than Gillespie):

# Nuclear compartment volume: v_nuc
# Cytoplasmic compartment volume: v_cyt
Sn -> Sc  # rate constant kTr

The kineticLaws in SBML are in the unit of firings per second. Here, the kineticLaw would be v_nuc*kTr*Sn. The ODEs would be:

dSn*v_nuc/dt = -v_nuc*kTr*Sn  # All good
dSc*v_cyt/dt = v_nuc*kTr*Sn  # Would be translated wrong when creating ODESystem from ReactionSystem

I am wondering if a ReactionSystem should have an optional field compartments (maybe created with @variable; Reactions could be used to grow or shrink them, if needed). If compartments is not empty, states would have to become state => compartment pairs or sth equivalent to define where the species reside. When creating an ODEProblem, u0 must be given for all compartments and states.

A different but less important issue would be that SBML allows to specify some species abundances with AlgebraicRules, which are not supported in a ReactionSystem (And maybe they shouldn't be. We can demand from users that they specify their SBML model as a pure reaction network, which in my experience, is a cleaner and less error prone way to specify such models anyway).

anandijain commented 3 years ago

We now use MathML.jl