Jump problem violates immutable problem assumption

SciML / JumpProcesses.jl

Build and simulate jump equations like Gillespie simulations and jump diffusions with constant and state-dependent rates and mix with differential equations and scientific machine learning (SciML)

https://docs.sciml.ai/JumpProcesses/stable/

Other

135 stars 35 forks source link

Jump problem violates immutable problem assumption #287

Open gaurav-arya opened 1 year ago

gaurav-arya commented 1 year ago

The integrators seem to inherit shared mutable state from the jump problem. This seems to violate the assumption that "Problem-related types in DifferentialEquations.jl are immutable." (https://docs.sciml.ai/DiffEqDocs/latest/basics/problem/). Here is a MWE of the issue:

using JumpProcesses

rate1(u,p,t) = p.λ
rate2(u,p,t) = p.λ
affect1!(integrator) = (integrator.u[1] += 1)
affect2!(integrator) = (integrator.u[1] += 2)
crj1 = ConstantRateJump(rate1, affect1!)
crj2 = ConstantRateJump(rate2, affect2!)

u₀ = [0]
p = (λ = 2.0, )
tspan = (0.0, 10.0)

dprob = DiscreteProblem(u₀, tspan, p)
jprob = JumpProblem(dprob, Direct(), crj1, crj2)

# Integrators inherit shared mutable state from the jump problem.
integrator1 = init(jprob, SSAStepper())
integrator2 = init(jprob, SSAStepper())
integrator1.cb.condition === integrator2.cb.condition

# This causes the solve of one integrator to affect the other.
solve!(integrator1)
println(integrator1.cb.condition.prev_jump)
# prints 1
solve!(integrator2)
println(integrator2.cb.condition.prev_jump)
# prints 2 (depending on random luck)
println(integrator1.cb.condition.prev_jump)
# also prints 2

Presumably this would lead to issues if one were to e.g. manually do multithreaded parallelism over many solve's with the same problem, which in my understanding should be a valid thing to do, even if an EnsembleProblem is preferred?

There's already #184, which looks related, but from the discussion there it didn't seem like this issue was a precise dup.

isaacsas commented 1 year ago

I think the immutable problem assumption docs were added long after JumpProblem, so it looks like it was forgotten that it does not hold here when those docs were written. But we should definitely get some internals-type docs here that make clear it doesn't hold for JumpProblem. At the end of the day, JumpProblem is a weird entity since it actually stores its simulation algorithm (i.e. aggregator), and the associated cache structure (which stores the rng too). I'm not sure any other problem type does this.

Regarding multithreading, are you experiencing a bug? @ChrisRackauckas had added some checks in SSAStepper's init to deepcopy the JumpProblem in the multi-threaded case. We intentionally don't deepcopy in the serial case as that can destroy performance when one is trying to sequentially run a bunch of simulations for sampling purposes (due to reallocating memory for the internal data structures of the aggregators).

If you have suggestions on an alternative approach please do let me know! It might be worth making a big breaking change if we can make this cleaner.

gaurav-arya commented 1 year ago

I encountered this issue because I was making my own aggregator where I tracked some stateful information within the aggregator and directly extracted integrator.cb.condition.{some state} after running solve! on the integrator. This became an issue when I did this for multiple integrators, because all the cb.conditions pointed to the result of the most recent run and additionally the state accumulated over all the runs together. So a bit of an obscure case, I suppose, and I solved it exactly by just deepcopy'ing the problem on creation of the integrator.

isaacsas commented 1 year ago

In the call to init you can also set alias_jump = false, see

https://github.com/SciML/JumpProcesses.jl/blob/eeede564e52c1ae7d43e1042754d8c1f84f04e60/src/SSA_stepper.jl#L136

I don't like that we have different behavior in the multithreaded (where we also deepcopy) and the serial cases, but I'm not sure of how we should resolve this and still allow reuse of the cache structures from solve to solve within a thread.

gaurav-arya commented 1 year ago

Reading around, it seems is that it's expected behaviour in SciML for all the caches etc. to be created upon initialization of an integrator, and if one wanted to avoid this for repeated solves one should create an integrator and repeatedly reinit it (https://discourse.julialang.org/t/solving-same-diffeq-repeatedly-but-with-different-initial-conditions/49402/2, https://docs.sciml.ai/DiffEqDocs/stable/basics/integrator/#SciMLBase.reinit!). So it could be that it's totally expected to have to allocate the aggregator on init? (And perhaps a higher effort change would be a redesign such that these caches are not at all part of the problem, but only created when the integrator is formed).

I don't know how often this special feature of non-allocating repeated solves is used: perhaps it's quite a lot, in which case removing it would be a breaking change in practice.

ChrisRackauckas commented 1 year ago

Indeed Gaurav is correct here that we usually try to make sure any and all caches are in the init phase and not in the problem type, specifically to avoid any re-solve issues. We might not have made that clear enough on CommonSolve.jl's docs, but it is intended.

https://docs.sciml.ai/CommonSolve/stable/

So it could be that it's totally expected to have to allocate the aggregator on init?

That would be fine. I think something we might need is a way to choose a new seed on an inited integrator: SDEs need this too so it's just a missing feature, but seemingly a bit more important for jump processes where the init would be a lot heavier than the solve in many cases.

isaacsas commented 1 year ago

Those docs seem clear enough, they just postdate when the design here was created by quite a bit. (JumpProcesses' design goes back to before I started working on the library in 2018...)

I'd think running many repeated solves in a loop to generate statistics is the most common JumpProcesses workflow. Isn't that what serial EnsembleProblems also do (i.e. they don't use the integrator interface)?

So we could make this change, but it seems like then JumpProcesses would have to have the docs updated to use the integrator interface for all workflows, as I'd think running just one simulation is an unusual case (we'd also need to explain that users should avoid serial EnsembleProblems for the same reason unless they are also changed to use the integrator interface).

Internally, we'd also have to defer construction and/or initialization of the constant rate jump DiscreteCallback till the call to init (since it wraps the caches).