SciML / Catalyst.jl

Chemical reaction network and systems biology interface for scientific machine learning (SciML). High performance, GPU-parallelized, and O(1) solvers in open source software.
https://docs.sciml.ai/Catalyst/stable/
Other
463 stars 78 forks source link

stoichiometric matrix as alternative input #78

Closed jonstrutz11 closed 5 years ago

jonstrutz11 commented 5 years ago

DiffEqBiological.jl makes it easy to go from a reaction network (defined using the DSL) to an ODEProblem.

However, large biological models (100s of reactions +) are often encoded in a more convenient format such as SBML. That said, it would be great if we could input a stoichiometric matrix directly instead of listing the reactions in the DSL, as an alternative input. We would also need to input a vector of rates as well.

This would allow those with large biological models to go from a stoichiometric matrix (which can be generated using COBRA tools) directly to an ODEProblem.

isaacsas commented 5 years ago

Can COBRA load SBML files? If so it would be great to get a translation layer developed.

ChrisRackauckas commented 5 years ago

See the discussion at https://github.com/opencobra/COBRA.jl/issues/13 . Specifically, https://stochasticreveller.wordpress.com/2016/08/02/sbml-and-the-julia-programming-language/ would make a good package. Then it would be up to us to support using it. That would make for a good GSoC project, but I am not sure if I can attract a student to it.

isaacsas commented 5 years ago

Getting SBML support would be great, but it poses two big challenges:

  1. The SBML API is gigantic, just wrapping the Python libsbml would be a major undertaking.
  2. Even having a Julia-based interface to the API, it would be a ton of work to adapt DiffEqBiological to support the many SBML features. We would need support for species tagged to specific compartments (with different volumes), mixed units, arbitrary mathematical expressions, and lots more.

That seems like a task for a full-time programmer. I imagine programs like Bionetgen and Copasi that fully support SBML are funded by large NIH grants that support hiring such programmers.

isaacsas commented 5 years ago

@jonstrutz11 Is your stoichiometric matrix just for fundamental reactions (i.e. no Hill expressions or such)? If so it might be possible to write a converter to import it into a reaction network.

One thing we should explore is the structure of the ODE and SDE functions we currently generate. Right now I think it is just a single monolithic function, but I wonder if instead calling a mass action reaction evaluation function as we do for jumps might offer better performance for large reaction networks. @ChrisRackauckas any thoughts?

ChrisRackauckas commented 5 years ago

Building a single function is always going to be the most performant. The mass action reactions win over the other normal jumps because they get rid of a function call that cannot easily be inlined.

But it wouldn't be difficult to have a non-macro interface for stoichometry matrices that generates the functions for the given problems just via some linear algebra. The only difficult part would be to generate a system of mass action jumps from it, but really that's not too bad.

isaacsas commented 5 years ago

What about if one has several thousand species and tens of thousands of reactions? Could the ODE rhs function size lead to cache issues? You're probably right that it wouldn't make a difference, but having never coded up a single function that long I don't really know what effect it could have.

Even if one puts all the rate evaluations directly in the generated ODE rhs function, could there be an advantage to not evaluating rate terms multiple times in different ODEs, as currently done, but instead looping through the reactions, evaluating their rate, and adding that contribution into the corresponding du entries?

jonstrutz11 commented 5 years ago

Yes, so I would be loading an SBML model and converting each reaction into a series of elementary reactions (using my own code). At that point, I could use a tool like cobrapy to create a stoichiometric matrix describing all of those elementary reactions, which I need to convert to an ODEProblem (assuming mass action kinetics).

It seems like COBRA.jl cannot take in SBML files, but I know that cobrapy can be used to load SBML files and can also generate a stoichiometric matrix.

isaacsas commented 5 years ago

We might need more than just rate constants and stoichiometric matrix; the latter is not sufficient to determine the rate law, right? For example if we have A+B --> A + C then the my understanding is the corresponding column of the "stoichiometry matrix" would be [0,-1,1], but the rate law needs A*B. So we probably need as input separate matrices with the reactant stoichiometry [1,1,0], the product stoichiometry [1,0,1], and the rate constants. Given this info we could formulate everything.

Alternatively, you can just generate a string from your reactions that encodes the DSL. I've used this to take text files with tens of thousands of reactions in a minimal format, translate them to a string that is the corresponding DiffEqBiological DSL model, and then evaluate the string to create a reaction_network. The parsing and creation of the base reaction_network is pretty fast (at most a few seconds). What is rate limiting is assembling the ODE/SDE/jump functions (hence why we created the new min_reaction_network macro that lets one assemble those incrementally as they are needed).

jonstrutz11 commented 5 years ago

Yeah that is right - I didn't think about reactions with the same metabolite on each side. So actually, using an S-matrix wouldn't work as well as I thought. Thanks for catching that!

I originally thought to do the alternative option as you suggest, but thought it might be easier to go from the S-matrix directly, if possible. However, after this discussion, it now seems that just generating a string that encodes the DSL might be the easier way forward.

isaacsas commented 5 years ago

If you look at my ParseRxNetwork package you can see how to generate a reaction_network from a simple text file of the form:

S1 -> S3 , 1.82e-4
S3 -> S2 , 9.1e-5
S2 -> S3 , 1.5e-3
S3 -> S1 , 7.5e-4
S3 + S6 -> S7 , 1.53e-8
S2 + S6 -> S8 , 3.06e-8
S4 + S6 -> S9 , 5e-5
...
jonstrutz11 commented 5 years ago

Great! Thank you!

isaacsas commented 5 years ago

I'm closing this as ReactionNetworkImporters.jl now supports constructing a reaction_network from substrate and product stoichiometric matrcies (dense or sparse).