SciML / DifferentialEquations.jl

Multi-language suite for high-performance solvers of differential equations and scientific machine learning (SciML) components. Ordinary differential equations (ODEs), stochastic differential equations (SDEs), delay differential equations (DDEs), differential-algebraic equations (DAEs), and more in Julia.
https://docs.sciml.ai/DiffEqDocs/stable/
Other
2.87k stars 230 forks source link

Callbacks being missed #258

Closed BenWhetton closed 6 years ago

BenWhetton commented 6 years ago

Hi there!

I am having an issue where callbacks are being missed and I don't really understand why. I have tweaked the abstol and interp_points parameters in my problem but I get very inconsistent results. I have simplified things as much as possible and come up with this trivial example which exhibits the problem:

using DifferentialEquations

function sim(du, u, p, t)
     du[1] = 1
end

function cb1Condition(u,t,integrator)
     return u[1] - 0.001
end

function cb1Affect!(integrator)
     print("event1")
end

function cb2Condition(u,t,integrator)
     return u[1] - 0.002
end

function cb2Affect!(integrator)
     print("event2")
end

function simulate()
     save_positions = (true,true)
     cb1 = ContinuousCallback(cb1Condition, cb1Affect!, nothing; save_positions=save_positions, rootfind=true)
     cb2 = ContinuousCallback(cb2Condition, cb2Affect!, nothing; save_positions=save_positions, rootfind=true)
     u0=[0.0]
     prob = ODEProblem(sim, u0, (0.0, 0.05))
     sol = solve(prob, Tsit5(); dtmax=1e-6, callback = CallbackSet(cb2,cb1))
     return sol
end

sol = simulate()

event1 is printed but event2 is not. Changing the comparison value in cb2Condition to 0.0020001 makes event2 trigger as well which makes me suspect it might have something to do with events landing exactly on time steps.

I am relatively new to Julia and this library (julia's main attraction for me if it lives up to my hopes!) so I might be missing something obvious. Any help would be much appreciated!

ChrisRackauckas commented 6 years ago

which makes me suspect it might have something to do with events landing exactly on time steps.

Yup, it's a tolerance thing. In order to avoid getting stuck in an infinite loop, the callback mechanism rejects steps the first subinterval if the previous point is callback.abstol (default 1e-9) from zero. So if the previous step is 1e-9 from hitting the root, then this will happen.

Normally, this has a very minimal chance of happening. But if you're taking a bunch of time steps, then you're more likely to end up "in this zone" (so maybe it makes sense to make it relative to dt?). For example, if you run this without the dtmax then it works just fine:

function simulate()
     save_positions = (true,true)
     cb1 = ContinuousCallback(cb1Condition, cb1Affect!, nothing; save_positions=save_positions, rootfind=true)
     cb2 = ContinuousCallback(cb2Condition, cb2Affect!, nothing; save_positions=save_positions, rootfind=true)
     u0=[0.0]
     prob = ODEProblem(sim, u0, (0.0, 0.05))
     sol = solve(prob, Tsit5(); callback = CallbackSet(cb2,cb1))
     return sol
end

sol = simulate()

That will also be more efficient. If you wanted to save at every dt=1e-6, I would recommend using saveat instead of restricting the time step.

Additionally you can tweak the tolerance in the callback.

function simulate()
     save_positions = (true,true)
     cb1 = ContinuousCallback(cb1Condition, cb1Affect!, nothing;
                              save_positions=save_positions, rootfind=true)
     cb2 = ContinuousCallback(cb2Condition, cb2Affect!, nothing;
                              abstol = 1e-11,
                              save_positions=save_positions, rootfind=true)
     u0=[0.0]
     prob = ODEProblem(sim, u0, (0.0, 0.05))
     sol = solve(prob, Tsit5(); dtmax = 1e-6, callback = CallbackSet(cb2,cb1))
     return sol
end

sol = simulate()

That hits both events because it makes sure it hones in closer to zero and rejects for "was previously a rootfinding zero" with a smaller radius. Hopefully that's a thorough enough explanation that you can use that to solve your full problem.

Hopefully this helps. This is definitely a problem where there's no silver bullet for defaults: the tolerance here used to be 1e-12 by default until a few too common cases couldn't have the rootfinder converge well enough, so then it was upped to 1e-9 and there haven't been any reported issues (till now :( ), but we do need to find a way to make the tolerance more relative to the values. If you have ideas, I'm all ears.

BenWhetton commented 6 years ago

I haven't had the chance to look at the code yet but I was wondering why the rootfinder cannot ensure that the zero has been passed by resolving to a time/value within some tolerance just after the zero? I have done a lot of work with Simulink and I believe its "zero crossing detection" algorithm works that way. Would that avoid the infinite loop scenario, albeit introducing a very small bias pushing the roots slightly forward in time?

ChrisRackauckas commented 6 years ago

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/src/callbacks.jl#L118-L126

We use the floating point number before the zero. The reason is because you break domain assumptions if it's the one after the zero. For example, if the zero crossing is actually just measuring u[j] trying to keep it positive, after the zero would make u[j] negative, and codes like chemical reaction networks can be unstable when any of the values become negative.

But if the condition function is like 1000u[j], then one float back in the input is 1e-16 different while the output is 1e-12, which is where the relative scaling starts to come in.

Maybe a stateful approach is needed.

BenWhetton commented 6 years ago

Ah ok, I see how that would make sense for that kind of problem. Thanks for the quick replies! My application solves the equations in an optimisation loop and it must run a lot of times with slight variations in parameters without ever missing an event, so the current solution won’t work for me. I am suprised that not being able to guarantee that the events will be triggered isn't an issue for other applications. I can imagine applications where I'd like to guarantee that the root is passed as well.

I am starting to work my way through the code to try to understand how it julia works. After a quick scan, I thought that removing the condition on l43-47, leaving only l46 and replacing prevfloat with nextfloat on l118 might remove the check for it landing too close to the root and ensure that it always finds the next point after the root, but it didn't solve the problem so I guess I need to dig more.

I'm thinking about solutions to satisfy all requirements but I'd also like to test my program asap so a workaround would be useful. Any pointers would be much appreciated :).

ChrisRackauckas commented 6 years ago

I am suprised that not being able to guarantee that the events will be triggered isn't an issue for other applications. I can imagine applications where I'd like to guarantee that the root is passed as well.

There were issues when it first got added, then it was tweaked, then there wasn't an issue until now. I don't think you can ever guarantee the existence or non-existence of a root though. We do a bunch of extra interpolant checks unlike things like Sundials or SciPy to make sure we have the ability to catch roots where the signs only change in the middle of the interval (quick oscillation), but even that is parameter-dependent.

I'm thinking about solutions to satisfy all requirements but I'd also like to test my program asap so a workaround would be useful. Any pointers would be much appreciated :).

As a workaround, you can change the tolerances in the callbacks. But also, is there a reason you're using dtmax? I am not sure of a good reason for such a low dtmax, and am curious if it's being used where saveat should be.

ChrisRackauckas commented 6 years ago

I tried out the stateful approach and it worked well. The callbacks were changed to not be reliant on the tolerance for detection, and this makes them safe for small dt now. Thanks for the issue. The integrators with this change will be released soon.

ChrisRackauckas commented 6 years ago

The updated callbacks have been released. Please let me know if you encounter any further issues with the callbacks. Thanks for the report!

ChrisRackauckas commented 6 years ago

Now the stateful approach uses a derivative approximation to not have any interval that is discarded, so there's not reliance on interp points anymore (except for finding oscillatory events of course)

BenWhetton commented 6 years ago

Thanks for following up on this. I haven't had enough time to work on julia projects in a while. I haven't looked at the new implementation yet but I can still cause it to miss callbacks (see code sample below).

The low dtmax is to ensure that the shape is correctly preserved. This is critital for my project (I am aware that this library may have better solutions, althought I have a nagging feeling they didn't work for me for some reason). In any case, any chance of missing callbacks is a deal-breaker for my project.

This code should produce a sawtooth waveform, wrapping at pi/4 but it misses the 5th wrap.

import Plots.plot
import Plots.plotlyjs
import Plots.gui

using DifferentialEquations

function my_model(du, u, p, t)
     du[1] = 100*pi/3
end

function wrapCondition(u,t,integrator)
     return u[1] - pi/4
end

function wrapAffect!(integrator)
     integrator.u[1] = mod(integrator.u[1], pi/4)
     print("wrapping!")
end

function simulate()
     save_positions = (true,true)
     wrapCb = ContinuousCallback(wrapCondition, wrapAffect!, nothing; save_positions=save_positions, rootfind=true, abstol=1e-20)

     u0 = [0.0]
     prob = ODEProblem(my_model, u0, (0.0, 0.05))
     sol = solve(prob, Tsit5(); abstol=1e-8, reltol=1e-6, dtmax=1e-6, dtmin=1e-20, callback = wrapCb)
     return sol
end

sol = simulate()

plotlyjs()
plot(sol.t, [u[1] for u in sol.u])
gui()
ChrisRackauckas commented 6 years ago

I didn't backport https://github.com/JuliaDiffEq/DiffEqBase.jl/commit/62ac2975b93895886c1b842cc298918f8aef6449 , so I'll test it with this. The new callback system re-event checks aren't sensitive to having a small abstol`, so having it below machine epsilon is fine again and probably preferred. I'll check right now if the full fix does it. DiffEqBase.jl is v0.7 only now but we can backport this if necessary since it's not a big change.

ChrisRackauckas commented 6 years ago

newplot

It looks fine with the epsilon abstol?

ChrisRackauckas commented 6 years ago

I backported the tolerance fix as v3.13.1. Zeros are now considered only below machine epsilon now and that should be as accurate as it gets (?). Let me know if it's possible to break it. I think the only way to break it now is if it literally cannot rootfind to zero because the condition isn't numerically stable enough to get close?