SciML / Catalyst.jl

Chemical reaction network and systems biology interface for scientific machine learning (SciML). High performance, GPU-parallelized, and O(1) solvers in open source software.
https://docs.sciml.ai/Catalyst/stable/
Other
463 stars 78 forks source link

Population Balance equations #292

Closed yewalenikhil65 closed 3 years ago

yewalenikhil65 commented 3 years ago

https://en.m.wikipedia.org/wiki/Smoluchowski_coagulation_equation

Would love to see an example of this equation efficiently solved using Catalyst and/or NetworkDynamics packages

I found them easy to implement with simple MATLAB functions but am unable to think of using Catalyst here

yewalenikhil65 commented 3 years ago

I did a very naive implementation. But I did by writing a user-defined function in Julia as follows: I suspect the equations would be very expensive when N -> Larger ( N = 4; # number of clusters of particles we allow in the code) Its a system of basically lot of particles of same species.. Only that you treat u as type of cluster of particles(singlets. doublets and so on).. And only reactions like binary collisions are allowed.This is version of irreversible type.

using OrdinaryDiffEq, DiffEqJump
using Plots
gr()

function PBE(du,u,p,t)

  K(x,y) = x + y;
    # additive-kernel ...could be any function in x and y
    # generic function for kernel operator which is equivalent to rate of reaction
    # some more types of  kernels are K=1; K = x*y..
    # in general K(x,y,t)

  N = 4;   # number of clusters..
 # u ,..not exactly species but cluster array and,
 # index of cluster array denotes the number of particles in cluster

  RHS_1 = zeros(N);    # First summation on RHS of the equation      
  RHS_2 = zeros(N);   # Second summation on the RHS of the equations
  for i ∈ 1:N
    for j ∈ 1:i-1
        RHS_1[i] = RHS_1[i] + K(i-j,j)*u[i-j]*u[j];
    end
    RHS_1[i] = (1/2)*RHS_1[i];

    for j ∈ 1:N
        RHS_2[i] = RHS_2[i] + K(i,j)*u[i]*u[j];
    end
    du[i] = RHS_1[i] - RHS_2[i];
  end

end

tspan = (0., 0.05)

# initially system consists of..
#  100 singlets, 0 doublets, 0 triplets and 0 quadralets(if that is even a term)
u0 = [100.; 0.; 0.; 0.]

# solve ODEProblem
oprob = ODEProblem(PBE, u0, tspan)
sol = solve(oprob, Tsit5())
plot(sol,lw=2)
yewalenikhil65 commented 3 years ago

Hi @isaacsas , You are right , it should be modelled as a discrete jump process to begin with. I was able to do it in MATLAB by say simply treating singlets , doublets etc as some bins which will get filled based on Gillespie rule. But I had to implement original Gillespie for that and decide which collisions/reactions will occur . I wondered if it was doable in easier way using julia packages like catalysts, and diffeqjump ..

isaacsas commented 3 years ago

The Wikipedia page, https://en.m.wikipedia.org/wiki/Smoluchowski_coagulation_equation, seems to list a set of reactions. (See the diagram on the right side at the top.) Can't you just code them up? i.e. make species X1 for state 1, X2 for state 2, and then have reactions like X1 + X1 -> X2, X1 + X2 -> X3, etc.

yewalenikhil65 commented 3 years ago

The Wikipedia page, https://en.m.wikipedia.org/wiki/Smoluchowski_coagulation_equation, seems to list a set of reactions. (See the diagram on the right side at the top.) Can't you just code them up? i.e. make species X1 for state 1, X2 for state 2, and then have reactions like X1 + X1 -> X2, X1 + X2 -> X3, etc.

I did using ModelingToolkit first as I had some problem with usual @reaction_network notation..

using ModelingToolkit
##
N = 4;   # number of clusters..
@parameters t
@variables X[1:N](t)
k(x,y) = x*y;

rxs = []
for m = 2:N
    for i = 1:Int(floor(m/2))
        push!(rxs, Reaction(k(i,m-i), [ X[i],X[m-i] ],[ X[m] ]))
    end
end

p=[]  # some empty params
rs = ReactionSystem(rxs, t, X, p)
## solve JumpProblem

# Initially system consists of...
# 100 singlets, 0 doublets, 0 triplets
tspan = (0., 0.2);   # seconds
u0 = [100; 0; 0 ;0]   # initial

dprob = DiscreteProblem(rs, u0, tspan)
jprob = JumpProblem(rs,dprob, Direct())
jsol = solve(jprob, SSAStepper())
plot(jsol, lw = 2)

Problem i faced with @reaction_network is as follows..logic of adding reactions in for loop seems correct.but. Could not understand the error


using Catalyst, DiffEqJump
using ModelingToolkit
N = 4;   # number of clusters..
@parameters t
@variables X[1:N](t)

k(x,y) = x + y;  # aggregating rate

PBE= @reaction_network begin
  for m = 2:N
      for i = 1:Int(floor(m/2))
            k(i,m-i), X[i] + X[m-i] --> X[m]
      end
  end
end

julia>LoadError: ArgumentError: tuple must be non-empty
in expression starting at /home/chetan/.julia/packages/Catalyst/JcEoo/src/reaction_network.jl:148
first(::Tuple{}) at tuple.jl:95
_parse_vars(::Symbol, ::Type{T} where T, ::Tuple{}) at variables.jl:78
@variables(::LineNumberNode, ::Module, ::Vararg{Any,N} where N) at variables.jl:190
isaacsas commented 3 years ago

Yeah, @reaction_network is really only for providing a list of reactions in the format described in the docs. If you want to build reactions programmatically using loops and such you should build the Reactions and ReactionSystem directly as you did above. The whole reason we created the ReactionSystem was to have a nice backend the Catalyst DSL could write to, but which could still be used in a programmatic fashion if needed.

isaacsas commented 3 years ago

It may at some point be worth supporting vector reaction notations / notations for specifying structured reactions like in Bionetgen, but as your code shows it is pretty easy to just directly build the needed ReactionSystem.

yewalenikhil65 commented 3 years ago

I am thinking how would heterogeneous system (involving aggregates of multiple species(say 2 species) look like in this code ..i assume that seems more realistic example.. for 2 species(say X and Y) It might involve twice as more combinations of reactions than in single case, making the dynamic allocation of reactions using push! more cumbersome.

yewalenikhil65 commented 3 years ago

Hi @isaacsas , I had query regarding above mentioned problem. Is it expected that solutions from ODEProblem and JumProblem should agree with each other in each system ? (or this particular smoluchowski coag equation system also ?)

isaacsas commented 3 years ago

@yewalenikhil65 Sorry for taking a while to get back to your previous question -- this week was the start of classes so really busy... If you have two or more species there isn't anything we currently have to make it simpler to make really large systems than just creating all the individual Reactions. We do plan to add an interface for graph and spatial systems eventually, which might work for your problem too, but at the end of the day when using Catalyst and ModelingToolkit you will always have extra allocations due to creating the symbolic model(s) first, and then generating the computational models (ODEProblem/JumpProblem/etc) from them. If you have a large enough system it is probably better for now to just generate the ODE rhs function or the MassActionJumps you need directly and bypass the symbolic layer. Hopefully this is something that will get better optimized though as time goes on.

The coagulation reactions are non-linear, so there is no reason one would expect the mean of the stochastic model to equal the ODE models. That said, there are probably certain limits where the two would be expected to agree (like large-population limits).

isaacsas commented 3 years ago

Just to clarify what I mean; systems with reactions of the form 0 --> A or A --> B should have that the mean of the stochastic model equals the ODE models. As soon as you have reactions like A + B --> C or such this is no longer true, except in certain limits.

yewalenikhil65 commented 3 years ago

Just to clarify what I mean; systems with reactions of the form 0 --> A or A --> B should have that the mean of the stochastic model equals the ODE models. As soon as you have reactions like A + B --> C or such this is no longer true, except in certain limits.

MIchaelis-menten does follow even after being non-linear. image

I remember in gillespie like simulations, propensity function depends on volume for nonlinear interactions while not so in linear interactions. Is this the reason for the disagreement ?

isaacsas commented 3 years ago

Mathematically, they are only guaranteed to be identical for systems of only zero and first order reactions (to my knowledge). For systems with nonlinear reactions this doesn't mean they won't often agree or be close, just that it is not rigorously true in general that the mean of the stochastic models are exactly the ODE solutions (again, excepting certain types of limits where the stochastic model might converge to the mean of the ODE models).

In your case it appears the stochastic model is very close to the ODE model. Perhaps this is due to the population size being large enough, and the parameters being in an appropriate regime. One would generally expect the stochastic model to agree with the ODE model in the limit of large population sizes (which in your case might only require tens of molecules to be a "large" population).

isaacsas commented 3 years ago

One reason for the disagreement is that when calculating the ODE for the mean of the stochastic model, a reaction like A + B -> C contributes a term like

(k/V)*E[A(t)*B(t)]

where E[.] denotes the average. However, in general it is not true that

E[A(t)*B(t)] = E[A(t)]*E[B(t)]

i.e. the two species are not uncorrelated. Hence, the ODEs for the means will be coupled to ODEs for second moments, which will be coupled to ODEs for third moments and so on.

In contrast, 0 -> A at rate k0 or A -> B at rate k1, contribute terms like

E[k0] = k0
E[k1*A] = k1*E[A]

which only involve first moments, which results in the ODEs for the mean of the stochastic model being identical to the normal deterministic mass action ODE model.

isaacsas commented 3 years ago

Note, the volume scaling you mention is actually what results in the stochastic model converging to the ODE model in the large population limit.

isaacsas commented 3 years ago

A couple good references on this material are the Handbook of Stochastic Methods by Gardiner and Stochastic Processes in Physics and Chemistry by Van Kampen.

yewalenikhil65 commented 3 years ago

@isaacsas Thank you for pointing me those references I have made a little Benchmarking using DiffEqJump solvers for the simplest case of PBE. Is there any way I could contribute as a tutorial ?

isaacsas commented 3 years ago

That would be great! If you are using Catalyst you could add a tutorial to the docs here, or you could submit one to:

https://github.com/SciML/SciMLTutorials.jl

(We actually need to update the DiffEqBiological ones there...)

yewalenikhil65 commented 3 years ago

That would be great! If you are using Catalyst you could add a tutorial to the docs here, or you could submit one to:

https://github.com/SciML/SciMLTutorials.jl

(We actually need to update the DiffEqBiological ones there...)

Actually i used MTK itself for programmatically generating ReactionSystem as we discussed a bit while back. But i do think it could be done also easily by Catalyst also. (using addspecies, addparam and addreactions might help)

I see https://github.com/SciML/SciMLTutorials.jl would appreciate some brief descriptions also..Will do that today with as much description as I can

isaacsas commented 3 years ago

Using ReactionSystems is essentially using Catalyst, and a tutorial on how to programmatically generate them would be appropriate here or at SciMLTutorials. ReactionSystems may actually be moved over here from MT at some point.

yewalenikhil65 commented 3 years ago

Hey @isaacsas , would you like to take a look here before we decide to add to the Catalyst/sciMLtutorials as tutorial ? https://github.com/yewalenikhil65/Population-balance-equation Benchmarking Jump methods on solving PBE https://github.com/yewalenikhil65/Population-balance-equation/blob/main/Figs/benchmarking_PBE.png

yewalenikhil65 commented 3 years ago

Hey @isaacsas I have tried to verify result with an analytical solution that is available for simple kernel functions. I am bit unsure of the way I have created it. Especially because the way monomer species (i.e first species) doesn't match with analytical solution(expecting it to match as I consider more cluster sizes i. e N , but my laptop runs out of space very soon using all SSA methods). I have also added the references from where I have taken the analytical solution. Do take a look whenever you get free. Thanks!

isaacsas commented 3 years ago

I’ll try to take a look this weekend!

yewalenikhil65 commented 3 years ago

I’ll try to take a look this weekend!

Thank you , sir 🙂

yewalenikhil65 commented 3 years ago

If you have a large enough system it is probably better for now to just generate the ODE rhs function or the MassActionJumps you need directly and bypass the symbolic layer. Hopefully this is something that will get better optimized though as time goes on.

Little update on this , I think this particular advise of yours regarding generating MassActionJumps programmatically before-hand helped me. Thanks a lot!.. I am not sure whether what I have done bypasses symbolic layer or not as you said, but it certainly helped me verify the analytical solution available in the literature for basic kernels except for multiplicative kernel..

isaacsas commented 3 years ago

@yewalenikhil65 Sorry for taking a while to look at it. It's a nice tutorial! I'd suggest we add it to Catalyst's docs as a tutorial illustrating how to programmatically construct a network, or add it to SciMLTutorials.

If you want to make a PR to either one of those repos I'll make some comments on wording and such, and we can work to get it into final form to merge.

Very neat!

isaacsas commented 3 years ago

I'm glad the advice to use MassActionJumps helped. You are definitely going to have issues as N gets big using Catalyst right now -- we still have work to do on efficiently turning the Catalyst model into the MassActionJump model when the number of reactions gets really big (more than tens of thousands I'd guess currently). That's something we know about, and can hopefully improve in the future. One goal is to allow reactions on graphs, which could be a better structure for this type of problem in terms of system generation.

There are also issues in DiffEqJump with the dependency graph representation becoming inefficient for sufficiently large systems, so we have some work to do there too if one wants to use the fastest SSAs on really big systems.

yewalenikhil65 commented 3 years ago

@yewalenikhil65 Sorry for taking a while to look at it. It's a nice tutorial! I'd suggest we add it to Catalyst's docs as a tutorial illustrating how to programmatically construct a network, or add it to SciMLTutorials.

If you want to make a PR to either one of those repos I'll make some comments on wording and such, and we can work to get it into final form to merge.

Very neat!

@isaacsas Lets add to Catalyst docs as an example. I will create a PR of this, but I have never created a PR for any sciML repository , can you guide me through this ? update : I have made a pull-request for Catalyst docs. I hope its correct way to do

yewalenikhil65 commented 3 years ago

There are also issues in DiffEqJump with the dependency graph representation becoming inefficient for sufficiently large systems, so we have some work to do there too if one wants to use the fastest SSAs on really big systems.

Does this mean I should expect some inefficiency with RSSA/RSSACR like SSAs as they require dependency graphs ? I think I have read somewhere that using Reaction() and ReactionSystem creates a dependency graph automatically

isaacsas commented 3 years ago

If you get to really large systems (millions of reactions or more I think) the dependency graphs can start taking up a lot of memory. This will be an issue with most of our SSAs once the graphs are big enough. Fixing this requires some updates to DiffEqJump to allow for more flexible representations of the graphs.

There is also an issue where their construction using ModelingToolkit can be pretty slow for systems with hundreds of thousands of reactions (maybe even less). This is not really an issue with RSSA or such, but with ModelingToolkit's generation of the dependency graphs.

yewalenikhil65 commented 3 years ago

If you get to really large systems (millions of reactions or more I think) the dependency graphs can start taking up a lot of memory. This will be an issue with most of our SSAs once the graphs are big enough. Fixing this requires some updates to DiffEqJump to allow for more flexible representations of the graphs.

There is repo called LightGraphs ..maybe that can be useful ?

yewalenikhil65 commented 3 years ago

closed with https://github.com/SciML/Catalyst.jl/pull/307