Closed fkrauer closed 3 years ago
I haven't read through the thread properly, but might be related to https://github.com/SciML/DifferentialEquations.jl/issues/610 ? I.e. try not using concrete_solve
?
And other than that, when using reverse-mode AD, e.g. ReverseDiff
, you should probably tell DifferentialEquations.jl to use the same for computing the sensitivity ("gradient through the ODE"): https://diffeq.sciml.ai/stable/analysis/sensitivity/#High-Level-Interface:-sensealg. Though I would think it solve
would figure that out automatically, so it might just be the usage of concrete_solve
that causes the issue :confused:
Tusen takk @torfjelde, I have changed concrete_solve
to solve
, but that didn't solve the issue. I think the mutation of objects within the Turing model is not allowed for backwards AD. So I changed a few things and removed the if statements, and now it seems to work (see below). However, it is extremely slow, as is forwarddiff AD. Running 100 iters (which is far from convergence) can take 30 minutes or so. This strikes me as a bit odd given that Julia is so fast.
Has anyone ever successfully fitted an ODE model larger than the Lotka-Volterra (say 10 compartments) with Turing? I'd be very interested to see this code and learn from it.
...
theta = [ψ, ρ, β,η,ω,φ]
...
incidence2 = max.(0.0, incidence)
increp = incidence2 .* theta[2]
data ~ arraydist(NegativeBinomial2.(theta[1], increp))
Tusen takk
Haha, that caught me off guard! Null problem:)
I think the mutation of objects within the Turing model is not allowed for backwards AD.
This is indeed the case for Zygote.jl, but ReverseDiff.jl should be able to handle it. But even if there is mutation within your ODE, I'm pretty certain that the adjoints/gradients defined for solve
will handle this, e.g. https://turing.ml/dev/tutorials/10-bayesian-differential-equations/ involves mutation.
I recently helped someone with speeding up DiffEq + Turing, so I'll have a crack at this and get back to you. I'm curious myself:)
Okay, so after trying different optimizations I think I realized why it's taking you up-to 30 mins to only get 100 samples: you're actually running 5000 + 100 iterations, haha :sweatsmiley: It took me way longer than I'd like to admit to realize this. NUTS(5000, ...)
means that we'll use 5000 iterations for adaptation/burnin and then 100 samples. I was way to focused on the model itself and didn't think about this.
Anyways, the following is using the model from Using Tsit5()
with the out-of-domain handling added back in as it seems to be the version that provide the most perf improvement without too much hassle, e.g. we don't have to reach for static arrays (though in fairness the static arrays did seem to help a bit).
As you can see, 1000 samples + 500 adaptation = 1500 iterations in total takes about 100s, which ain't too shabby if I might say so myself. Notice that even the ESS and R-values are looking pretty decent even with just 1500 iterations.
@model function turingmodel(data, theta_fix, u0, problem, solvsettings)
# Priors
ψ ~ Beta(1,5)
ρ ~ Uniform(0.0,1.0)
β ~ Uniform(0.0,1.0)
η ~ Uniform(0.0,1.0)
ω ~ Uniform(1.0, 3.0*365.0)
φ ~ Uniform(0.0,364.0)
theta_est = [β,η,ω,φ]
p_new = @LArray vcat(theta_est, theta_fix) (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)
# Update problem and solve ODEs
problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0))
sol_new = Array(solve(
problem_new,
solvsettings.solver,
abstol=solvsettings.abstol,
reltol=solvsettings.reltol,
isoutofdomain=(u,p,t)->any(<(0),u),
save_idxs=9,
saveat=solvsettings.saveat,
maxiters=solvsettings.maxiters
))
# Early return if we terminated early due to out-of-domain.
if length(sol_new) - 1 != length(data)
Turing.@addlogprob! -Inf
return nothing
end
incidence = sol_new[2:end] - sol_new[1:(end-1)]
# avoid numerical instability issue
incidence = max.(zero(eltype(incidence)), incidence)
data ~ arraydist(@. NegativeBinomial2(ψ, incidence * ρ))
end
model = turingmodel(data, theta_fix, u0, problem, solvsettings);
# Execute once to ensure that it's working correctly.
results = model();
model = turingmodel(data, theta_fix, u0, problem, parnames, merge(solvsettings, (solver = Tsit5(), )));
chain = sample(model, NUTS(), 1_000);
┌ Warning: The current proposal will be rejected due to numerical error(s).
│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /home/tor/.julia/packages/AdvancedHMC/bv9VV/src/hamiltonian.jl:47
┌ Warning: The current proposal will be rejected due to numerical error(s).
│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /home/tor/.julia/packages/AdvancedHMC/bv9VV/src/hamiltonian.jl:47
┌ Info: Found initial step size
│ ϵ = 0.0044342041015625
└ @ Turing.Inference /home/tor/.julia/packages/Turing/YGtAo/src/inference/hmc.jl:188
┌ Warning: The current proposal will be rejected due to numerical error(s).
│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /home/tor/.julia/packages/AdvancedHMC/bv9VV/src/hamiltonian.jl:47
Sampling: 100%|█████████████████████████████████████████| Time: 0:01:15
chain
Chains MCMC chain (1000×18×1 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 1
Samples per chain = 1000
Wall duration = 103.05 seconds
Compute duration = 103.05 seconds
parameters = φ, ρ, ω, β, ψ, η
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯
ψ 0.1517 0.0103 0.0003 0.0003 1074.7326 0.9994 ⋯
ρ 0.0010 0.0000 0.0000 0.0000 674.1901 0.9990 ⋯
β 0.2890 0.0046 0.0001 0.0002 494.4507 1.0062 ⋯
η 0.0705 0.0035 0.0001 0.0002 576.2473 1.0093 ⋯
ω 379.0073 7.7127 0.2439 0.3320 537.2055 0.9995 ⋯
φ 178.7325 2.7117 0.0858 0.1418 475.8661 1.0055 ⋯
1 column omitted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
ψ 0.1319 0.1447 0.1516 0.1584 0.1733
ρ 0.0010 0.0010 0.0010 0.0010 0.0010
β 0.2802 0.2857 0.2888 0.2921 0.2974
η 0.0639 0.0680 0.0705 0.0729 0.0775
ω 364.5389 373.7467 378.6863 384.1705 394.3270
φ 173.4356 176.8529 178.7410 180.3889 183.9574
theta_est
6-element Vector{Float64}:
0.15
0.001
0.28
0.07
365.0
180.0
Recovered the true parameters nicely.
And just for reference, you can see an example run below + a bunch of versions of the model that I tried out in sequence.
This is absolutely amazing, thanks so much @torfjelde, you saved my research project. This was a toy model, my actual model is a bit larger (17 compartments), but the code was basically the same. The time saved by changing the outofdomain statement plus type stability for the incidence is incredible.
PS: The 5000 burnin was a typo 🤪 sorry.
Great, really glad to hear!:)
Regarding dropping the outofdomain
check:
max
does not always guarantee that you're sampling from the true posterior, but in most cases it does. It's fine to just use the max
to ensure that you're not seeing any zeros if you never actually see any zeros during sampling. Essentially if the data is informative enough to stop us from choosing parameters that lead to invalid values, then we'll never reach the boundary of the space where we go from valid to invalid values (in this case, values below 0) and thus the max
will never actually do anything. In this case, sampling with or without the max
is equivalent and we're still sampling from the true posterior. But max
can be useful to avoid issues in the adaption/burnin phase where we indeed can end up in bad regions.So just keep that in mind :+1: What you could do is add the following line to the end of your model
return sol_new
and then after you're done sampling, just to check that nothing strange occurred, you can run generated_quantities(model, chain)
to all the solutions used to generate chain
. Then you can just check that none of the solutions that occured in chain
were invalid.
But if you do end up dropping the outofdomain
check, then you can actually use ReverseDiff
with rdcache(true)
+ InterpolatingAdjoint(autovecjac=ReverseDiffVJP())
so that might be useful once you go to higher dims (though I think 17 compartments is still something ForwardDiff can handle easily).
PS: The 5000 burnin was a typo :zany_face: sorry.
Haha, no worries at all. I should have thought of that immediately. I was really confused as to why it was taking so long on your end, and I had to actually run a complete sample and noticed the iterations
in the chain said 5001:5500
, haha. Oh well, I don't think either of us will make that mistake again :sweaty_smile:
Hi everyone
I have a small ODE model, which runs fine (albeit slow) with Turing forwarddiff AD. However, I cannot get it to run with reversediff (and neither with Zygote). It throws a MethodError (see below). Is this some type problem, maybe related to the parameters being an array? What would I need to change? TIA
PS: I also posted this in the Julia Forum.
Error:
Code: