SciML / DiffEqParamEstim.jl

Easy scientific machine learning (SciML) parameter estimation with pre-built loss functions
https://docs.sciml.ai/DiffEqParamEstim/stable/
Other
61 stars 34 forks source link

Interesting benchmarks #60

Closed ChrisRackauckas closed 6 years ago

ChrisRackauckas commented 6 years ago

@finmod I think you might want to see these:

https://github.com/jonathanBieler/BlackBoxOptimizationBenchmarking.jl

This probably explains our speed and success rates. Optim and BlackBoxOptim don't seem to be doing so well yet on black box functions, so if we're relying heavily on them...

finmod commented 6 years ago

Noted. The challenge has always been to throw difficult problems at the optimizer and BBO acquitted itself quite well so far. The novelty here is: cmaes.jl which "translates" the Python CMA-ES in Julia.

@tpapp may want to test this instead of optim in the recent LV example. All of this goes in the right direction for the parameter estimation of nonlinear dynamical systems. DynamicHMC, IndirectLikelihood and now this, super progress for Julia.

ChrisRackauckas commented 6 years ago

Maybe we should ask for a parameter estimation problem to be in the benchmark to see a good comparison? I've just been running optimizer tests by hand and haven't really made it precise since I was just focused on getting the API to make it easy to call the optimizers (and letting them do the rest).

tpapp commented 6 years ago

@finmod: I am not sure I understand what you are suggesting that I try.

In theory, the tuning should not be so sensitive to the starting point, as long as it is "reasonable". The HMC sampler itself acts as a sort of optimizer once it is tuned. I will investigate this (fiddling with the parameters of the Nesterov dual method), but currently I don't have time.

finmod commented 6 years ago

Correct. Since on line 60 of the gist: https://gist.github.com/tpapp/019ccbd450c996410a0d0ab1514ab16c you are using Optim, the "menu" of optimizers could include BlackBoxOptim or this CMAES depending on how much time is spent here.

finmod commented 6 years ago

@ChrisRackauckas Concerning modifying the benchmark to include a parameter estimation problem, the benchmark has been fixed for several years now and is graduated from easy to difficult functions. If it is to remain a benchmark, I guess that it should not be modified too often.

The parameter estimation problem for the Lorenz system has been in the examples folder of Robert Feldt/BBO https://github.com/robertfeldt/BlackBoxOptim.jl/blob/master/examples/ode_lorentz_attractor.jl for 9 months now. This was with the Euler solver instead of Tsit5 or better. You may want to advise on which solver is best to put in this visible example. Then, run the benchmark on this problem.

@jonathanBieler I notice that NLopt has been commented out as a poor performer. My experience with the parameter estimation problem for Lorenz system is similar.

jonathanBieler commented 6 years ago

I've had some issues with NLopt (some optimisers crashes), but I'll try to add some of them to the benchmark, they don't seem to be as good as CMA-ES though.

For the BBOB functions, those are designed to test different properties of the algorithms/functions, (the doc is quite interesting: http://coco.lri.fr/downloads/download15.01/bbobdocfunctions.pdf), but it might not be ideal to benchmark real world cases. I haven't implement all the functions yet as well.

I think there's another benchmark that consists only of real-world problems, but I cannot find it back.

ChrisRackauckas commented 6 years ago

The parameter estimation problem for the Lorenz system has been in the examples folder of Robert Feldt/BBO https://github.com/robertfeldt/BlackBoxOptim.jl/blob/master/examples/ode_lorentz_attractor.jl for 9 months now. This was with the Euler solver instead of Tsit5 or better. You may want to advise on which solver is best to put in this visible example. Then, run the benchmark on this problem.

Yeah, that example they show there has the problem I discussed before that Euler (or any fixed time step discretization) will have the tendency to converge to giving a trajectory that matches the parameters, but that trajectory isn't necessarily correct due to truncation error. Especially on a chaotic problem, a high order adaptive method is necessary since then it uses different steps for different parameter setups and thus will converge to the true trajectory instead of simply a numerical trajectory.

finmod commented 6 years ago

So, which is the best solver for this Lorenz system?

ChrisRackauckas commented 6 years ago

Probably Vern9. If you do the uncertainty quantification

http://docs.juliadiffeq.org/latest/analysis/uncertainty_quantification.html#Example-3:-Adaptive-ProbInts-on-the-Lorenz-Attractor-1

then the plots will show you the timespan that you can trust that the integration is tracking the true solution. Since it is a chaotic solution you need to be very careful about that since the errors grow exponentially. For example, with BS3() at default tolerances:

using ParameterizedFunctions, OrdinaryDiffEq, DiffEqUncertainty
g = @ode_def_bare LorenzExample begin
  dx = σ*(y-x)
  dy = x*(ρ-z) - y
  dz = x*y - β*z
end σ=>10.0 ρ=>28.0 β=(8/3)
u0 = [1.0;0.0;0.0]
tspan = (0.0,30.0)
prob = ODEProblem(g,u0,tspan)
cb = AdaptiveProbIntsUncertainty(3)
sol = solve(prob,BS3())

monte_prob = MonteCarloProblem(prob)
sim = solve(monte_prob,BS3(),num_monte=100,callback=cb)
using Plots; plotly(); plot(sim,vars=(0,1),linealpha=0.4)

we get

newplot

which shows the error explodes by t=25. Since that's 3x time order of Euler and adaptive, I wouldn't be surprised if Euler was giving junk at decent dt by around t=10. And that's the issue with that test in BBO: you can make parameters converge to the numerical solution of that Euler discretization, but that Euler discretization is not necessarily the true solution with those parameters, so then you still get junk out if tested with data that is generated from (approximately) the true solution.

With the Vern9 method we can push it all the way past t=50

using ParameterizedFunctions, OrdinaryDiffEq, DiffEqUncertainty, DiffEqMonteCarlo
g = @ode_def_bare LorenzExample begin
  dx = σ*(y-x)
  dy = x*(ρ-z) - y
  dz = x*y - β*z
end σ=>10.0 ρ=>28.0 β=(8/3)
u0 = [1.0;0.0;0.0]
tspan = (0.0,50.0)
prob = ODEProblem(g,u0,tspan)
cb = AdaptiveProbIntsUncertainty(9)
sol = solve(prob,Vern9())

monte_prob = MonteCarloProblem(prob)
sim = solve(monte_prob,Vern9(),num_monte=40,callback=cb,reltol=1e-10,abstol=1e-10)
using Plots; gr(); plot(sim,vars=(0,1),linealpha=0.4)
savefig("vern9_10.png")
pwd()

prob = ODEProblem(g,u0,tspan)
sim = solve(monte_prob,Vern9(),num_monte=40,callback=cb,reltol=1e-14,abstol=1e-14)
using Plots; gr(); plot(sim,vars=(0,1),linealpha=0.4)
savefig("vern9_14.png")

vern9_10

vern9_14

That shows that Vern9 is at least safe for parameter estimation up to t=50 on this equation (decreasing the tolerances a bit). To get up to t=100 you probably need BigFloats, and this is all because of the chaos in the equation.

So the point I am trying to make is that the integrator with sufficiently strict tolerances is a necessary condition for parameter estimation to work. Any which is sufficiently accurate is fine. Vern9 is both very accurate and efficient, so it's a great choice for chaotic equations and tolerances should be chosen to get the accuracy needed to match the trajectory, but not too much because otherwise the objective function takes longer to evaluate than necessary.

But this makes it pretty clear why any Euler method is doomed to converge to nonsense.

finmod commented 6 years ago

So Vern9 with reltol=1e-14 and abstol=1e-14 is best solver for tspan = (0.0,50.0). This is an important point for @Datseris working on embedding and reconstruct() to dispel built-in errors in the dataseries when trying to reconstruct any continuous system using JuliaDiffEq.

You should also indicate the BigFloat solution for tspan = (0.0, 100.0) for completeness and alignment of Julia with C and others.

For now, I have a small bug that β=(8/3) should be β=>(8/3). The latter gives the following error:

MethodError: Cannot convert an object of type Float64 to an object of type Expr This may have arisen from a call to the constructor Expr(...), since type constructors fall back to convert methods.

We have dealt with that in the past.

On the benchmarking, remember the exercise at: https://github.com/finmod/DiffEqParamEstim.jl/blob/master/test/lorenz_true_test.jl . Will see if the conclusions change after updating the notebook.

Datseris commented 6 years ago

Thanks for the mention @finmod . It is really good to know about this topic. I will make sure to add this link http://docs.juliadiffeq.org/latest/analysis/uncertainty_quantification.html#Example-3:-Adaptive-ProbInts-on-the-Lorenz-Attractor-1 to my documentation page. I am also thinking of changing the default solver from Tsit5() to Vern9() and appropriate default tolerances (maybe 1e-9 is fine). What do you think @ChrisRackauckas ? In general, also with the benchmarks of Sebastion on the Henon system, Vern9 seems the "go-to" thing. It also conserves energy extremely well and only falls slightly below the speed of DPKRN12, which is a much more limiting method.

A comment: The methods I use in DynamicalSystems.jl are not "too" worried about solution divergence, provided that the system is either bounded or has an attracting set. (which is almost all systems in nonlinear dynamics). That is because of the shadowing theorem.

A problem would be a huper-fine structure with super-imposed and co-existing attractors with fractal boundary set, but after some finite t you will have divergence anyways... So people can just use High accuracy solver and try to deduce the best method to find the "accuracy" t using the link from the DiffEq website.

@ChrisRackauckas I noticed that the graphs presented here are different from the tutorial page http://docs.juliadiffeq.org/latest/analysis/uncertainty_quantification.html#Example-3:-Adaptive-ProbInts-on-the-Lorenz-Attractor-1. Will you be kind enough to notify me when you update this page, and then I will link it to my documentation as well.

ChrisRackauckas commented 6 years ago

For now, I have a small bug that β=(8/3) should be β=>(8/3). The latter gives the following error

Uhh... I don't have this bug. Are you on v0.6.2? Copy-pasting that code should work. It doesn't need to be =>, they are just different. = fixes the value, => makes it variable for estimation.

You should also indicate the BigFloat solution for tspan = (0.0, 100.0) for completeness and alignment of Julia with C and others.

What do you mean by this? BTW, t=100 is small mostly because this problem is chaotic. For most problems it's much much higher, and Float64 gets a lot more mileage.

So Vern9 with reltol=1e-14 and abstol=1e-14 is best solver for tspan = (0.0,50.0).

On this problem. It depends on the problem. That is very much overkill for something which isn't chaotic, and will slow the computation when it's not necessary. There's >200 methods for a reason haha.

I am also thinking of changing the default solver from Tsit5() to Vern9() and appropriate default tolerances (maybe 1e-9 is fine). What do you think @ChrisRackauckas ? In general, also with the benchmarks of Sebastion on the Henon system, Vern9 seems the "go-to" thing. It also conserves energy extremely well and only falls slightly below the speed of DPKRN12, which is a much more limiting method.

Yes, for your purposes, you're estimating a lot of choatic systems and probably need to do so at high accuracy, and most of these systems are non-stiff. Since that's the case, Vern9 with a lower default tolerance is a great choice. DPRKN12 is also a great choice, but yes this method requires that the system can be partitioned which really only occurs in the case of second order ODEs, so it's not a great default but a great secondary choice to speed things up when applicable.

A comment: The methods I use in DynamicalSystems.jl are not "too" worried about solution divergence, provided that the system is either bounded or has an attracting set. (which is almost all systems in nonlinear dynamics). That is because of the shadowing theorem.

Huh, never heard of that. At least with parameter estimation though, you have to hit the correct trajectory otherwise your L2 distance from the data gets screwy. There's probably algorithms specifically for chaotic systems (which I imagine you have already?) which don't fall prey to this. But that's because the standard methods were made for standard (i.e. non-chaotic) problems, and the "standard" fix is just to solve with more accuracy.

@ChrisRackauckas I noticed that the graphs presented here are different from the tutorial page http://docs.juliadiffeq.org/latest/analysis/uncertainty_quantification.html#Example-3:-Adaptive-ProbInts-on-the-Lorenz-Attractor-1. Will you be kind enough to notify me when you update this page, and then I will link it to my documentation as well.

Are these ones needed on that page? That page was just showing how this works. Is this analysis not clear from the example that's already given?

finmod commented 6 years ago

For now, I have a small bug that β=(8/3) should be β=>(8/3). The latter gives the following error

Uhh... I don't have this bug. Are you on v0.6.2? Copy-pasting that code should work. It doesn't need to be =>, they are just different. = fixes the value, => makes it variable for estimation.

Yes, I am on Julia 0.6.2 and this is in the estimation notebook to benchmark BBO and NLopt. I have also tried ParameterizedFunctions on master. Let me give you the full print out.

g1 = @ode_def_bare LorenzExample begin dx = σ(y-x) dy = x(ρ-z) - y dz = xy - βz end σ=>10.0 ρ=>28.0 β=>(8/3) # Parameters used to construct the dataset

r0 = [1.0; 0.0; 0.0] #[-11.8,-5.1,37.5] PODES Initial values of the system in space # [0.1, 0.0, 0.0] tspan = (0.0, 50.0) # PODES sample of 3000 observations over the (0,30) timespan prob = ODEProblem(g1, r0, tspan) tspan2 = (0.0, 3.0) # Xiang test sample of 300 observations with a timestep of 0.01 prob_short = ODEProblem(g1, r0, tspan2)

MethodError: Cannot convert an object of type Float64 to an object of type Expr This may have arisen from a call to the constructor Expr(...), since type constructors fall back to convert methods.

Stacktrace: [1] LorenzExample(::Expr, ::Array{Expr,1}, ::Array{SymEngine.Basic,1}, ::Array{Expr,1}, ::Array{Expr,1}, ::Array{Symbol,1}, ::Array{SymEngine.Basic,1}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Array{Symbol,1}, ::Float64, ::Float64, ::Float64) at C:\Users\Denis.julia\v0.6\ParameterizedFunctions\src\maketype.jl:25 [2] #LorenzExample#1(::Expr, ::Array{Expr,1}, ::Array{SymEngine.Basic,1}, ::Array{Expr,1}, ::Array{Expr,1}, ::Array{Symbol,1}, ::Array{SymEngine.Basic,1}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Array{SymEngine.Basic,2}, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Expr, ::Array{Symbol,1}, ::Float64, ::Float64, ::Float64, ::Type{T} where T) at C:\Users\Denis.julia\v0.6\ParameterizedFunctions\src\maketype.jl:101 [3] LorenzExample() at C:\Users\Denis.julia\v0.6\ParameterizedFunctions\src\maketype.jl:101 [4] include_string(::String, ::String) at .\loading.jl:522

ChrisRackauckas commented 6 years ago

I did =, not =>. This issue is one of the reasons why we are developing a different DSL, so it's classified under "won't fix because ParameterizedFunctions is going away". Just use 2.66, =, or a standard parameterized function for now.

Datseris commented 6 years ago

Surprised that you do not know the shadowing theorem, its like a light in the darkness of numerical precision :D

Well, you are right, new examples are not really required but I would be glad to have a Vern9 version since I will use it as the default. I will make a PR about it myself, don't worry!

Datseris commented 6 years ago

Nah no reason to do a PR. It is super clear after reading it carefully. I will add a comment on my docs that Vern9 is better at this (as expected)

tpapp commented 6 years ago

@finmod: I did test BlackBoxOptim.jl for a couple of difficult problems (not ODE ones, some other multimodal and weird likelihoods) and is has proven to be quite amazing. Finding a "representative" peak, even if it is not the mode, really helps the tuning/adaptation of NUTS. Thanks for the suggestion, I really appreciate it.