TuringLang / Turing.jl

Bayesian inference with probabilistic programming.
https://turinglang.org
MIT License
2.03k stars 218 forks source link

Turing MethodError with reversediff but not with forwarddiff #1673

Closed fkrauer closed 3 years ago

fkrauer commented 3 years ago

Hi everyone

I have a small ODE model, which runs fine (albeit slow) with Turing forwarddiff AD. However, I cannot get it to run with reversediff (and neither with Zygote). It throws a MethodError (see below). Is this some type problem, maybe related to the parameters being an array? What would I need to change? TIA

PS: I also posted this in the Julia Forum.

Error:

ERROR: MethodError: convert(::Type{ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1}}, ::ReverseDiff.TrackedReal{Float64, Float64, Nothing}) is ambiguous. Candidates:
  convert(::Type{R}, t::T) where {R<:Real, T<:ReverseDiff.TrackedReal} in ReverseDiff at C:\Users\micky\.julia\packages\ReverseDiff\E4Tzn\src\tracked.jl:260
  convert(::Type{ForwardDiff.Dual{T, V, N}}, x::Number) where {T, V, N} in ForwardDiff at C:\Users\micky\.julia\packages\ForwardDiff\5gUap\src\dual.jl:380
  convert(::Type{T}, x::Number) where T<:Number in Base at number.jl:7
  convert(::Type{ForwardDiff.Dual{T, V, N}}, x) where {T, V, N} in ForwardDiff at C:\Users\micky\.julia\packages\ForwardDiff\5gUap\src\dual.jl:379
Possible fix, define
  convert(::Type{ForwardDiff.Dual{T, V, N}}, ::T) where {T, V, N, T<:ReverseDiff.TrackedReal}
Stacktrace:

setindex!(A::Vector{ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1}}, x::ReverseDiff.TrackedReal{Float64, Float64, Nothing}, i1::Int64) at .\array.jl

SEIRS2!(du::Vector{ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1}}, u::Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, p::LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}, t::ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, Float64, 1}) at h:\My Documents\SEIRS2_NUTS_toy.jl

(::ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing})(::Vector{ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1}}, ::Vararg{Any, N} where N) at C:\Users\micky.julia\packages\SciMLBase\cU5k7\src\scimlfunctions.jl

(::SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}})(du2::Vector{ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1}}, t::ForwardDiff.Dual{ForwardDiff.Tag{SciMLBase.TimeGradientWrapper{ODEFunction{true, typeof(SEIRS2!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, LArray{ReverseDiff.TrackedReal{Float64, Float64, Nothing}, 1, Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)}}, Float64}, Float64, 1}) at C:\Users\micky.julia\packages\SciMLBase\cU5k7\src\function_wrappers.jl

[TRUNCATED]

Code:

# Household stuff
cd(@__DIR__)
using Pkg 
Pkg.activate(".") 

using DifferentialEquations
using StatsPlots
using StatsBase
using Turing
using Distributions
using Random
using LabelledArrays
using Serialization
using LazyArrays
using ReverseDiff
using DiffEqSensitivity
using Zygote
using Memoization

Random.seed!(422)

# ODE model
function SEIRS2!(du,u,p,t)

    # states
    (S1, E1, I1, R1, S2, E2, I2, R2) = u[1:8]
    N1 = S1 + E1 + I1 + R1
    N2 = S2 + E2 + I2 + R2
    N = N1 + N2

    # params
    β = p.β
    η = p.η
    φ = p.φ
    ω = 1.0/p.ω
    μ = p.μ
    σ = p.σ
    γ1 = p.γ1
    γ2 = γ1 / p.g2
    δ2 = p.δ2

    # FOI
    βeff = β * (1.0+η*cos(2.0*π*(t-φ)/365.0))
    λ1 = βeff*(I1/N1 + I2/N2)
    λ2 = λ1 * δ2

    # change in states
    du[1] = μ*N - λ1*S1 - μ*S1
    du[2] = λ1*S1 - σ*E1 - μ*E1
    du[3] = σ*E1 - γ1*I1 - μ*I1
    du[4] = γ1*I1 - ω*R1 - μ*R1

    du[5] = ω*(R1 + R2) - λ2*S2 - μ*S2
    du[6] = λ2*S2 - σ*E2 - μ*E2
    du[7] = σ*E2 - γ2*I2 - μ*I2
    du[8] = γ2*I2 - ω*R2 - μ*R2

    du[9] = (σ*(E1 + E2)) # cumulative incidence

end

# observation model
function NegativeBinomial2(ψ, incidence)
    p = 1.0/(1.0 + ψ*incidence)
    r = 1.0/ψ
    return NegativeBinomial(r, p)
end

# Solver settings
tmin = 0.0
tmax = 20.0*365.0
tspan = (tmin, tmax)
solvsettings = (abstol = 1.0e-3, 
                reltol = 1.0e-3, 
                saveat = 7.0,
                solver = AutoTsit5(Rosenbrock23()))

# Initiate ODE problem
theta_fix =  [1.0/4.98, 1.0/(80*365), 0.89, 1/6.16, 0.87]
theta_est =  [0.15, 0.001, 0.28, 0.07, 365.0, 180.0]
parnames = (:ψ, :ρ, :β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)
p = @LArray [theta_est; theta_fix] parnames
u0 = [200_000.0,1000.0,1000.0,300_000.0, 500_000.0, 1000.0,1000.0, 296_000, 2000.0]

# Initiate ODE problem
problem = ODEProblem(SEIRS2!,u0,tspan,p)

sol = solve(problem, 
            solvsettings.solver, 
            abstol=solvsettings.abstol, 
            reltol=solvsettings.reltol, 
            isoutofdomain=(u,p,t)->any(x->x<0.0,u),
            save_idxs=9, 
            saveat=solvsettings.saveat)

# Fake some data from model
foo = (sol[2:end] - sol[1:(end-1)]) .* p.ρ
data = rand.(NegativeBinomial2.(p.ψ, foo))
plot(foo, legend = false); scatter!(data,legend = false)

# Fit model to fake data

# Set up as Turing model
#Turing.setadbackend(:forwarddiff)
Turing.setadbackend(:reversediff)
Turing.setrdcache(true)

@model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings) 

    # Priors
    ψ ~ Beta(1,5)
    ρ ~ Uniform(0.0,1.0) 
    β ~ Uniform(0.0,1.0) 
    η ~ Uniform(0.0,1.0) 
    ω ~ Uniform(1.0, 3.0*365.0) 
    φ ~ Uniform(0.0,364.0)

    theta_est = [β,η,ω,φ]
    p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end]

    # Update problem and solve ODEs
    problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) 

    sol_new = concrete_solve(problem_new, 
                    solvsettings.solver, 
                    abstol=solvsettings.abstol, 
                    reltol=solvsettings.reltol, 
                    isoutofdomain=(u,p,t)->any(x->x<0.0,u), 
                    save_idxs=9,
                    saveat=solvsettings.saveat)

    incidence = sol_new[2:end] - sol_new[1:(end-1)]
    incidence = max.(0.0, incidence) # avoid numerical instability issue
    increp = incidence .* ρ
    if size(increp,1)==size(data,1) 
        data ~ arraydist(LazyArray(@~ @. NegativeBinomial2(ψ, increp)))
    else 
        Turing.@addlogprob! -Inf
        return
    end 

end

model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings)

@time trace = mapreduce(c -> sample(model, NUTS(5000, 0.65), 1000, progress=true), chainscat, 1:3)   
torfjelde commented 3 years ago

I haven't read through the thread properly, but might be related to https://github.com/SciML/DifferentialEquations.jl/issues/610 ? I.e. try not using concrete_solve?

And other than that, when using reverse-mode AD, e.g. ReverseDiff, you should probably tell DifferentialEquations.jl to use the same for computing the sensitivity ("gradient through the ODE"): https://diffeq.sciml.ai/stable/analysis/sensitivity/#High-Level-Interface:-sensealg. Though I would think it solve would figure that out automatically, so it might just be the usage of concrete_solve that causes the issue :confused:

fkrauer commented 3 years ago

Tusen takk @torfjelde, I have changed concrete_solve to solve, but that didn't solve the issue. I think the mutation of objects within the Turing model is not allowed for backwards AD. So I changed a few things and removed the if statements, and now it seems to work (see below). However, it is extremely slow, as is forwarddiff AD. Running 100 iters (which is far from convergence) can take 30 minutes or so. This strikes me as a bit odd given that Julia is so fast.

Has anyone ever successfully fitted an ODE model larger than the Lotka-Volterra (say 10 compartments) with Turing? I'd be very interested to see this code and learn from it.

...
theta = [ψ, ρ, β,η,ω,φ]
...
incidence2 = max.(0.0, incidence)
increp = incidence2 .* theta[2]
data ~ arraydist(NegativeBinomial2.(theta[1], increp))
torfjelde commented 3 years ago

Tusen takk

Haha, that caught me off guard! Null problem:)

I think the mutation of objects within the Turing model is not allowed for backwards AD.

This is indeed the case for Zygote.jl, but ReverseDiff.jl should be able to handle it. But even if there is mutation within your ODE, I'm pretty certain that the adjoints/gradients defined for solve will handle this, e.g. https://turing.ml/dev/tutorials/10-bayesian-differential-equations/ involves mutation.

I recently helped someone with speeding up DiffEq + Turing, so I'll have a crack at this and get back to you. I'm curious myself:)

torfjelde commented 3 years ago

Okay, so after trying different optimizations I think I realized why it's taking you up-to 30 mins to only get 100 samples: you're actually running 5000 + 100 iterations, haha :sweatsmiley: It took me way longer than I'd like to admit to realize this. NUTS(5000, ...) means that we'll use 5000 iterations for adaptation/burnin and then 100 samples. I was way to focused on the model itself and didn't think about this.

Anyways, the following is using the model from Using Tsit5() with the out-of-domain handling added back in as it seems to be the version that provide the most perf improvement without too much hassle, e.g. we don't have to reach for static arrays (though in fairness the static arrays did seem to help a bit).

As you can see, 1000 samples + 500 adaptation = 1500 iterations in total takes about 100s, which ain't too shabby if I might say so myself. Notice that even the ESS and R-values are looking pretty decent even with just 1500 iterations.

@model function turingmodel(data, theta_fix, u0, problem, solvsettings)

    # Priors
    ψ ~ Beta(1,5)
    ρ ~ Uniform(0.0,1.0) 
    β ~ Uniform(0.0,1.0) 
    η ~ Uniform(0.0,1.0) 
    ω ~ Uniform(1.0, 3.0*365.0) 
    φ ~ Uniform(0.0,364.0)

    theta_est = [β,η,ω,φ]
    p_new = @LArray vcat(theta_est, theta_fix) (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2)

    # Update problem and solve ODEs
    problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) 

    sol_new = Array(solve(
        problem_new,
        solvsettings.solver,
        abstol=solvsettings.abstol,
        reltol=solvsettings.reltol,
        isoutofdomain=(u,p,t)->any(<(0),u),
        save_idxs=9,
        saveat=solvsettings.saveat,
        maxiters=solvsettings.maxiters
    ))

    # Early return if we terminated early due to out-of-domain.
    if length(sol_new) - 1 != length(data)
        Turing.@addlogprob! -Inf
        return nothing
    end

    incidence = sol_new[2:end] - sol_new[1:(end-1)]
    # avoid numerical instability issue
    incidence = max.(zero(eltype(incidence)), incidence)
    data ~ arraydist(@. NegativeBinomial2(ψ, incidence * ρ))
end

model = turingmodel(data, theta_fix, u0, problem, solvsettings);

# Execute once to ensure that it's working correctly.
results = model();
model = turingmodel(data, theta_fix, u0, problem, parnames, merge(solvsettings, (solver = Tsit5(), )));
chain = sample(model, NUTS(), 1_000);
┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /home/tor/.julia/packages/AdvancedHMC/bv9VV/src/hamiltonian.jl:47
┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /home/tor/.julia/packages/AdvancedHMC/bv9VV/src/hamiltonian.jl:47
┌ Info: Found initial step size
│   ϵ = 0.0044342041015625
└ @ Turing.Inference /home/tor/.julia/packages/Turing/YGtAo/src/inference/hmc.jl:188
┌ Warning: The current proposal will be rejected due to numerical error(s).
│   isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)
└ @ AdvancedHMC /home/tor/.julia/packages/AdvancedHMC/bv9VV/src/hamiltonian.jl:47
Sampling: 100%|█████████████████████████████████████████| Time: 0:01:15
chain
Chains MCMC chain (1000×18×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 103.05 seconds
Compute duration  = 103.05 seconds
parameters        = φ, ρ, ω, β, ψ, η
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat   ⋯
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64   ⋯

           ψ     0.1517    0.0103     0.0003    0.0003   1074.7326    0.9994   ⋯
           ρ     0.0010    0.0000     0.0000    0.0000    674.1901    0.9990   ⋯
           β     0.2890    0.0046     0.0001    0.0002    494.4507    1.0062   ⋯
           η     0.0705    0.0035     0.0001    0.0002    576.2473    1.0093   ⋯
           ω   379.0073    7.7127     0.2439    0.3320    537.2055    0.9995   ⋯
           φ   178.7325    2.7117     0.0858    0.1418    475.8661    1.0055   ⋯
                                                                1 column omitted

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%      97.5% 
      Symbol    Float64    Float64    Float64    Float64    Float64 

           ψ     0.1319     0.1447     0.1516     0.1584     0.1733
           ρ     0.0010     0.0010     0.0010     0.0010     0.0010
           β     0.2802     0.2857     0.2888     0.2921     0.2974
           η     0.0639     0.0680     0.0705     0.0729     0.0775
           ω   364.5389   373.7467   378.6863   384.1705   394.3270
           φ   173.4356   176.8529   178.7410   180.3889   183.9574
theta_est
6-element Vector{Float64}:
   0.15
   0.001
   0.28
   0.07
 365.0
 180.0

Recovered the true parameters nicely.

And just for reference, you can see an example run below + a bunch of versions of the model that I tried out in sequence.

Several sections showing different iterations of the model - [Setup](#orgede5845) - [Initial run](#org01e8b90) - [Fix type-instability in `max`](#org2e8f0f8) - [Embed `parnames` in model](#org5810b74) - [Dropping usage of LazyArrays.jl](#org8c71290) - [Using `Tsit5()`](#org9ca7977) - [`ModelingToolkit.jl`](#org41fec28) - [Immutable version](#orgc87f133) - [Static problem](#orgfe813aa) # Setup ```julia using Pkg Pkg.activate("/tmp/jl_RuWW8I/") ``` Activating environment at `/tmp/jl_RuWW8I/Project.toml` ```julia pkgs = string.([ :DifferentialEquations, :Turing, :Distributions, :Random, :LabelledArrays, :Serialization, :LazyArrays, :DiffEqSensitivity, :ModelingToolkit, :RecursiveArrayTools, :BenchmarkTools, :Zygote, :ReverseDiff, :ForwardDiff, :Tracker, :Memoization, :StaticArrays ]) Pkg.add(pkgs) ``` Updating registry at `~/.julia/registries/General` Updating git-repo `https://github.com/JuliaRegistries/General.git` Resolving package versions... No Changes to `/tmp/jl_RuWW8I/Project.toml` No Changes to `/tmp/jl_RuWW8I/Manifest.toml` ```julia using DifferentialEquations using Turing using Distributions using Random using LabelledArrays using Serialization using LazyArrays using DiffEqSensitivity using ModelingToolkit using RecursiveArrayTools using StaticArrays # Benchmarking related stuff. using BenchmarkTools # Use packages to ensure that we trigger Requires.jl. using Zygote: Zygote using ReverseDiff: ReverseDiff using ForwardDiff: ForwardDiff using Tracker: Tracker using Memoization: Memoization # used for ReverseDiff.jl cache. using Turing.Core: ForwardDiffAD, ReverseDiffAD, TrackerAD, ZygoteAD, CHUNKSIZE const DEFAULT_ADBACKENDS = [ ForwardDiffAD{40}(), # chunksize=40 ForwardDiffAD{100}(), # chunksize=100 TrackerAD(), ZygoteAD(), ReverseDiffAD{false}(), # rdcache=false ReverseDiffAD{true}() # rdcache=false ] # This is a piece of code I often use to benchmark models. It'll likely make it's # way into Turing.jl soonish. # https://gist.github.com/torfjelde/7794c384d82d03c36625cd25b702b8d7 """ make_turing_suite(model; kwargs...) Create default benchmark suite for `model`. # Keyword arguments - `adbackends`: a collection of adbackends to use. Defaults to `$(DEFAULT_ADBACKENDS)`. - `run_once=true`: if `true`, the body of each benchmark will be run once to avoid compilation to be included in the timings (this may occur if compilation runs longer than the allowed time limit). - `save_grads=false`: if `true` and `run_once` is `true`, the gradients from the initial execution will be saved and returned as the second return-value. This is useful if you want to check correctness of the gradients for different backends. # Notes - A separate "parameter" instance (`DynamicPPL.VarInfo`) will be created for _each test_. Hence if you have a particularly large model, you might want to only pass one `adbackend` at the time. """ function make_turing_suite( model, vi_orig=DynamicPPL.VarInfo(model); adbackends=DEFAULT_ADBACKENDS, run_once=true, save_grads=false ) suite = BenchmarkGroup() suite["not_linked"] = BenchmarkGroup() suite["linked"] = BenchmarkGroup() grads = Dict(:not_linked => Dict(), :linked => Dict()) vi_orig = DynamicPPL.VarInfo(model) spl = DynamicPPL.SampleFromPrior() for adbackend in adbackends vi = DynamicPPL.VarInfo(model) vi[spl] = deepcopy(vi_orig[spl]) if run_once ℓ, ∇ℓ = Turing.Core.gradient_logp( adbackend, vi[spl], vi, model, spl ) if save_grads grads[:not_linked][adbackend] = (ℓ, ∇ℓ) end end suite["not_linked"]["$(adbackend)"] = @benchmarkable $(Turing.Core.gradient_logp)( $adbackend, $(vi[spl]), $vi, $model, $spl ) # Need a separate `VarInfo` for the linked version since otherwise we risk the # `vi` from above being mutated. vi_linked = deepcopy(vi) DynamicPPL.link!(vi_linked, spl) if run_once ℓ, ∇ℓ = Turing.Core.gradient_logp( adbackend, vi_linked[spl], vi_linked, model, spl ) if save_grads grads[:linked][adbackend] = (ℓ, ∇ℓ) end end suite["linked"]["$(adbackend)"] = @benchmarkable $(Turing.Core.gradient_logp)( $adbackend, $(vi_linked[spl]), $vi_linked, $model, $spl ) end return save_grads ? (suite, grads) : suite end function test_gradient(model, adbackend, vi=DynamicPPL.VarInfo(model)) spl = DynamicPPL.SampleFromPrior() return Turing.Core.gradient_logp( adbackend, vi[spl], vi, model, spl ) end ``` test_gradient (generic function with 2 methods) ```julia # ODE model function SEIRS2!(du,u,p,t) # states (S1, E1, I1, R1, S2, E2, I2, R2) = u[1:8] N1 = S1 + E1 + I1 + R1 N2 = S2 + E2 + I2 + R2 N = N1 + N2 # params β = p.β η = p.η φ = p.φ ω = 1.0/p.ω μ = p.μ σ = p.σ γ1 = p.γ1 γ2 = γ1 / p.g2 δ2 = p.δ2 # FOI βeff = β * (1.0+η*cos(2.0*π*(t-φ)/365.0)) λ1 = βeff*(I1/N1 + I2/N2) λ2 = λ1 * δ2 # change in states du[1] = μ*N - λ1*S1 - μ*S1 du[2] = λ1*S1 - σ*E1 - μ*E1 du[3] = σ*E1 - γ1*I1 - μ*I1 du[4] = γ1*I1 - ω*R1 - μ*R1 du[5] = ω*(R1 + R2) - λ2*S2 - μ*S2 du[6] = λ2*S2 - σ*E2 - μ*E2 du[7] = σ*E2 - γ2*I2 - μ*I2 du[8] = γ2*I2 - ω*R2 - μ*R2 du[9] = (σ*(E1 + E2)) # cumulative incidence end # observation model function NegativeBinomial2(ψ, incidence; check_args=true) p = 1.0/(1.0 + ψ*incidence) r = 1.0/ψ return NegativeBinomial(r, p; check_args=check_args) end # Solver settings tmin = 0.0; tmax = 20.0*365.0; tspan = (tmin, tmax) solvsettings = ( abstol = 1.0e-3, reltol = 1.0e-3, saveat = 7.0, solver = AutoTsit5(Rosenbrock23()), maxiters = 1e6 ) # Initiate ODE problem theta_fix = [1.0/4.98, 1.0/(80*365), 0.89, 1/6.16, 0.87] theta_est = [0.15, 0.001, 0.28, 0.07, 365.0, 180.0] parnames = (:ψ, :ρ, :β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2) p = @LArray [theta_est; theta_fix] parnames u0 = [200_000.0,1000.0,1000.0,300_000.0, 500_000.0, 1000.0,1000.0, 296_000, 2000.0] # Initiate ODE problem problem = ODEProblem(SEIRS2!,u0,tspan,p) # Solve sol = solve( problem, solvsettings.solver, abstol=solvsettings.abstol, reltol=solvsettings.reltol, isoutofdomain=(u,p,t)->any(x->x<0.0,u), save_idxs=9, saveat=solvsettings.saveat ); foo = (sol[2:end] - sol[1:(end-1)]) .* p.ρ; data = rand.(NegativeBinomial2.(p.ψ, foo)); ``` # Initial run Same model as in the original issue (with the exception of the if-statement at the end). ```julia @model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings) # Priors ψ ~ Beta(1,5) ρ ~ Uniform(0.0,1.0) β ~ Uniform(0.0,1.0) η ~ Uniform(0.0,1.0) ω ~ Uniform(1.0, 3.0*365.0) φ ~ Uniform(0.0,364.0) theta_est = [β,η,ω,φ] p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end] # Update problem and solve ODEs problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) sol_new = solve( problem_new, solvsettings.solver, abstol=solvsettings.abstol, reltol=solvsettings.reltol, isoutofdomain=(u,p,t)->any(<(0),u), save_idxs=9, saveat=solvsettings.saveat, maxiters=solvsettings.maxiters ) incidence = sol_new[2:end] - sol_new[1:(end-1)] # avoid numerical instability issue incidence = max.(0.0, incidence) data ~ arraydist(LazyArray(@~ @. NegativeBinomial2(ψ, incidence * ρ))) end model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings); # Execute once to ensure that it's working correctly. results = model(); ``` Since the execution is non-deterministic, we need to use the same set of parameters in the benchmarking of the different models. ```julia var_info = DynamicPPL.VarInfo(model); ``` ```julia test_gradient(model, ForwardDiffAD{40}(), var_info) ``` | | | |------------------ |------------------------------------------------------------------------------------------------------------------ | | -51513.54325993552 | (70168.75780995484 12976.945884848457 33573.366670560834 -131047.33181480742 -40.79691974039409 75.67135749544357) | ```julia # ReverseDiff.jl without tape-compilation (i.e. `rdcache=false`). test_gradient(model, ReverseDiffAD{false}(), var_info) ``` | | | |------------------- |---------------------------------------------------------------------------------------------------------------- | | -51520.918599805795 | (70170.19934497547 12977.209992820593 33633.23016064762 -131178.3723725407 -40.82876406906327 76.09277036009247) | ```julia test_gradient(model, ZygoteAD(), var_info) ``` MethodError: no method matching LazyArray(::Vector{NegativeBinomial{Float64}}) Closest candidates are: (::Type{S})(::Complex{Num}) where S<:Union{AbstractFloat, Integer, Complex{var"#s19"} where var"#s19"<:AbstractFloat, Complex{var"#s18"} where var"#s18"<:Integer, AbstractArray} at /home/tor/.julia/packages/Symbolics/H8dtg/src/Symbolics.jl:43 (::Type{S})(::Num) where S<:Union{AbstractFloat, Integer, Complex{var"#s19"} where var"#s19"<:AbstractFloat, Complex{var"#s18"} where var"#s18"<:Integer, AbstractArray} at /home/tor/.julia/packages/Symbolics/H8dtg/src/Symbolics.jl:43 LazyArray(::Base.Broadcast.Broadcasted) at /home/tor/.julia/packages/LazyArrays/zliW5/src/lazybroadcasting.jl:35 ... Stacktrace: [1] macro expansion @ ~/.julia/packages/Zygote/TaBlo/src/compiler/interface2.jl:0 [inlined] [2] _pullback(ctx::Zygote.Context, f::Type{LazyArray}, args::Vector{NegativeBinomial{Float64}}) @ Zygote ~/.julia/packages/Zygote/TaBlo/src/compiler/interface2.jl:9 [3] _pullback @ ./In[6]:31 [inlined] .... All right, seems like Zygote isn't too happy about the usage of LazyArrays.jl. This could maybe be addressed by defining an adjoint for \`LazyArray\`, but let's leave that for now. Even though ReverseDiff.jl *works* we're not going to bother benchmarking it since we can't use the compiled tape due to the conditional statements in the model, i.e. we're only going to benchmark ForwardDiff. ```julia suite = make_turing_suite( model, var_info, adbackends=(ForwardDiffAD{40}(), ) ); ``` ```julia benchmarks = run(suite, seconds=10); benchmarks ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(3.467 ms) "not_linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(3.491 ms) The `linked` part is what we really care about when sampling using an AD-based sampler, e.g. `NUTS`, as it refers to the variables in the transformed/unconstrained space. ```julia minimum(benchmarks["linked"]["ForwardDiffAD{40}()"]) ``` BenchmarkTools.TrialEstimate: time: 3.467 ms gctime: 0.000 ns (0.00%) memory: 5.65 MiB allocs: 19833 # Fix type-instability in `max` The current line ```julia incidence = max.(0.0, incidence) ``` can lead to type-instability, since if `incidence < 0`, then we return `0.0` which is a `Float64`, and otherwise we get `eltype(indicidence)` which can be `ForwardDiff.Dual`, `ReverseDiff.TrackedReal`, etc. We can eliminate this instability by simply doing ```julia incidence = max.(zero(eltype(incidence)), incidence) ``` instead. ```julia @model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings) # Priors ψ ~ Beta(1,5) ρ ~ Uniform(0.0,1.0) β ~ Uniform(0.0,1.0) η ~ Uniform(0.0,1.0) ω ~ Uniform(1.0, 3.0*365.0) φ ~ Uniform(0.0,364.0) theta_est = [β,η,ω,φ] p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end] # Update problem and solve ODEs problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) sol_new = solve( problem_new, solvsettings.solver, abstol=solvsettings.abstol, reltol=solvsettings.reltol, isoutofdomain=(u,p,t)->any(<(0),u), save_idxs=9, saveat=solvsettings.saveat, maxiters=solvsettings.maxiters ) incidence = sol_new[2:end] - sol_new[1:(end-1)] # avoid numerical instability issue incidence = max.(zero(eltype(incidence)), incidence) data ~ arraydist(LazyArray(@~ @. NegativeBinomial2(ψ, incidence * ρ))) end model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings); # Execute once to ensure that it's working correctly. results = model(); ``` ```julia suite = make_turing_suite( model, var_info, adbackends=(ForwardDiffAD{40}(), ) ); benchmarks = run(suite, seconds=10); benchmarks ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.481 ms) "not_linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.484 ms) ```julia minimum(benchmarks["linked"]["ForwardDiffAD{40}()"]) ``` BenchmarkTools.TrialEstimate: time: 2.481 ms gctime: 0.000 ns (0.00%) memory: 5.86 MiB allocs: 10292 That's almost 1s / 30% improvement! That's pretty decent. Notice in particular how many allocations that are now gone because the Julia compiler can properly infer the type used in the broadcasted expression `arraydist(LazyArray(@~ @. NegativeBinomial2(ψ, incidence * ρ)))`. # Embed `parnames` in model The line ```julia p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end] ``` leads to the type of `p_new` not being inferrable, e.g. ```julia g(theta, parnames) = @LArray theta parnames[3:end] @code_warntype g(vcat(theta_est, theta_fix), parnames) ``` Variables #self#::Core.Const(g) theta::Vector{Float64} parnames::NTuple{11, Symbol} Body::LArray{Float64, 1, Vector{Float64}, _A} where _A 1 ─ %1 = Base.lastindex(parnames)::Core.Const(11) │ %2 = (3:%1)::Core.Const(3:11) │ %3 = Base.getindex(parnames, %2)::NTuple{9, Symbol} │ %4 = Core.apply_type(LabelledArrays.LArray, %3)::Type{LArray{_A, N, D, Syms} where {N, D<:AbstractArray{_A, N}, Syms}} where _A │ %5 = (%4)(theta)::LArray{Float64, 1, Vector{Float64}, _A} where _A └── return %5 Notice the `LArray{Float64, 1, Vector{Float64}, _A} where A_`. In all likelihood this doesn't matter too much because the only place where we use this array is when it's being passed to `remake` + the `eltype` of the array is correctly inferred. The only downside of it not being inferred is that things like property-access, e.g. `p_new.a` within the body of the model, would have to be checked at run-time rather than at compile-time. But here Julia will then perform a single dynamic lookup to figure out which `remake` to call and once that's done, there's no real cost to the type of `p_new` not being inferrable. But let's just remove it and see if it makes a difference. ```julia @model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings) # Priors ψ ~ Beta(1,5) ρ ~ Uniform(0.0,1.0) β ~ Uniform(0.0,1.0) η ~ Uniform(0.0,1.0) ω ~ Uniform(1.0, 3.0*365.0) φ ~ Uniform(0.0,364.0) theta_est = [β,η,ω,φ] # NOTE: This shoud be done in a nicer way, e.g. define a method which # takes in `problem.p` (an example `LArray` with the correct lables) # and the new values, and returns a `LArray` wit the already known labels from `problem.p`. p_new = @LArray vcat(theta_est, theta_fix) (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2) # Update problem and solve ODEs problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) sol_new = solve( problem_new, solvsettings.solver, abstol=solvsettings.abstol, reltol=solvsettings.reltol, isoutofdomain=(u,p,t)->any(<(0),u), save_idxs=9, saveat=solvsettings.saveat, maxiters=solvsettings.maxiters ) incidence = sol_new[2:end] - sol_new[1:(end-1)] # avoid numerical instability issue incidence = max.(zero(eltype(incidence)), incidence) data ~ arraydist(LazyArray(@~ @. NegativeBinomial2(ψ, incidence * ρ))) end model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings); # Execute once to ensure that it's working correctly. results = model(); ``` ```julia suite = make_turing_suite( model, var_info, adbackends=(ForwardDiffAD{40}(), ) ); benchmarks = run(suite, seconds=10); benchmarks ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.699 ms) "not_linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.646 ms) ```julia minimum(benchmarks["linked"]["ForwardDiffAD{40}()"]) ``` BenchmarkTools.TrialEstimate: time: 2.699 ms gctime: 0.000 ns (0.00%) memory: 6.24 MiB allocs: 11015 Nah, it didn't, as expected (it actually made it a tiny bit slower with more allocations; a bit uncertain why that is). But it's a point worth noting because if we instead were using this `p_new` throughout the model, e.g. accessing fields, it would be nice to ensure that accessing its properties was compiled away. Anyhew, we move on. # Dropping usage of LazyArrays.jl I'm curious whether LazyArrays.jl actually helps or not here, e.g. maybe it's better to materialize the broadcasted expression immediately? `arraydist` won't be too happy with `VectorOfArrays` that `solve` returns so we have to wrap `solve` in a call to `Array`. And since LazyArrays.jl was causing issues for Zygote.jl earlier, we can check if it works now that we're no longer using LazyArrays.jl. I'll add a `sensealg` keyword argument to the model to allow us to specify that we want to use `InterpolatingAdjoint(autojacvec`ZygoteVJP)= to obtain the sensitivities (i.e. "gradient" of `solve` wrt. parameters). ```julia @model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings; sensealg=InterpolatingAdjoint()) # Priors ψ ~ Beta(1,5) ρ ~ Uniform(0.0,1.0) β ~ Uniform(0.0,1.0) η ~ Uniform(0.0,1.0) ω ~ Uniform(1.0, 3.0*365.0) φ ~ Uniform(0.0,364.0) theta_est = [β,η,ω,φ] p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end] # Update problem and solve ODEs problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) sol_new = Array(solve( problem_new, solvsettings.solver, abstol=solvsettings.abstol, reltol=solvsettings.reltol, isoutofdomain=(u,p,t)->any(<(0),u), save_idxs=9, saveat=solvsettings.saveat, maxiters=solvsettings.maxiters, sensealg=sensealg )) incidence = sol_new[2:end] - sol_new[1:(end-1)] # avoid numerical instability issue incidence = max.(zero(eltype(incidence)), incidence) data ~ arraydist(@. NegativeBinomial2(ψ, incidence * ρ)) end model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings); # Execute once to ensure that it's working correctly. results = model(); ``` ```julia suite = make_turing_suite( model, var_info, adbackends=(ForwardDiffAD{40}(), ) ); benchmarks = run(suite, seconds=10); benchmarks ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.699 ms) "not_linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.611 ms) ```julia minimum(benchmarks["linked"]["ForwardDiffAD{40}()"]) ``` BenchmarkTools.TrialEstimate: time: 2.699 ms gctime: 0.000 ns (0.00%) memory: 6.30 MiB allocs: 10941 Not a big difference tbh, but maybe a *tiny* improvement. Let's try Zygote: ```julia model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings, sensealg=InterpolatingAdjoint(autojacvec=ZygoteVJP())); test_gradient(model, ZygoteAD(), var_info) ``` DimensionMismatch("Inconsistent array dimensions.") Stacktrace: [1] macro expansion @ ~/.julia/packages/Zygote/TaBlo/src/compiler/interface2.jl:0 [inlined] [2] _pullback(ctx::Zygote.Context, f::typeof(throw), args::DimensionMismatch) @ Zygote ~/.julia/packages/Zygote/TaBlo/src/compiler/interface2.jl:9 [3] _pullback @ ~/.julia/packages/Distributions/fXTVC/src/multivariates.jl:199 [inlined] [4] _pullback @ ~/.julia/packages/Distributions/fXTVC/src/multivariates.jl:264 [inlined] [5] _pullback @ ~/.julia/packages/DynamicPPL/F7F1M/src/context_implementations.jl:269 [inlined] [6] _pullback(::Zygote.Context, ::typeof(DynamicPPL.observe), ::Product{Discrete, NegativeBinomial{Float64}, Vector{NegativeBinomial{Float64}}}, ::VectorOfArray{Int64, 1, Vector{Int64}}, ::DynamicPPL.TypedVarInfo{NamedTuple{(:ψ, :ρ, :β, :η, :ω, :φ), Tuple{DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:ψ, Tuple{}}, Int64}, Vector{Beta{Float64}}, Vector{AbstractPPL.VarName{:ψ, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:ρ, Tuple{}}, Int64}, Vector{Uniform{Float64}}, Vector{AbstractPPL.VarName{:ρ, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:β, Tuple{}}, Int64}, Vector{Uniform{Float64}}, Vector{AbstractPPL.VarName{:β, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:η, Tuple{}}, Int64}, Vector{Uniform{Float64}}, Vector{AbstractPPL.VarName{:η, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:ω, Tuple{}}, Int64}, Vector{Uniform{Float64}}, Vector{AbstractPPL.VarName{:ω, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:φ, Tuple{}}, Int64}, Vector{Uniform{Float64}}, Vector{AbstractPPL.VarName{:φ, Tuple{}}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}}, Float64}) @ Zygote ~/.julia/packages/Zygote/TaBlo/src/compiler/interface2.jl:0 ... Now this is a weird error! We're using the same `var_info` for every call so we should not have any issues with early termination in `solve` or anything of the sort. I'll have to do some more digging to answer that properly. But before we check that, let's see if what sort of performance we get when using reverse-mode for the *sensitivity* calculation but ForwardDiff.jl for the rest of the gradient computation: ```julia model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings, sensealg=ForwardDiffSensitivity()); suite = make_turing_suite( model, var_info, adbackends=(ForwardDiffAD{40}(), ) ); benchmarks = run(suite, seconds=10); benchmarks ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.322 ms) "not_linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.317 ms) ```julia model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings, sensealg=InterpolatingAdjoint(autojacvec=ReverseDiffVJP(true))); suite = make_turing_suite( model, var_info, adbackends=(ForwardDiffAD{40}(), ) ); benchmarks = run(suite, seconds=10); benchmarks ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.378 ms) "not_linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.365 ms) ```julia minimum(benchmarks["linked"]["ForwardDiffAD{40}()"]) ``` BenchmarkTools.TrialEstimate: time: 2.378 ms gctime: 0.000 ns (0.00%) memory: 5.82 MiB allocs: 10011 ```julia model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings, sensealg=InterpolatingAdjoint(autojacvec=ZygoteVJP())); suite = make_turing_suite( model, var_info, adbackends=(ForwardDiffAD{40}(), ) ); benchmarks = run(suite, seconds=10); benchmarks ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.519 ms) "not_linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.544 ms) ```julia minimum(benchmarks["linked"]["ForwardDiffAD{40}()"]) ``` BenchmarkTools.TrialEstimate: time: 2.519 ms gctime: 0.000 ns (0.00%) memory: 6.10 MiB allocs: 10557 ```julia using Enzyme model = turingmodel(data, theta_fix, u0, problem, parnames, solvsettings, sensealg=InterpolatingAdjoint(autojacvec=EnzymeVJP())); suite = make_turing_suite( model, var_info, adbackends=(ForwardDiffAD{40}(), ) ); benchmarks = run(suite, seconds=10); benchmarks ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.340 ms) "not_linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(2.315 ms) ```julia minimum(benchmarks["linked"]["ForwardDiffAD{40}()"]) ``` BenchmarkTools.TrialEstimate: time: 2.340 ms gctime: 0.000 ns (0.00%) memory: 5.73 MiB allocs: 9843 So it's not making things much faster than just using ForwardDiff everywhere, *but* IIUC this could be a beneficial approach if the number of ODEs we're solving increases but the parameters does not. Then we could use the adjoint methods based on reverse-mode differentiation to scale nicely with the number of ODEs while using the speed of ForwardDiff.jl for the rest of the model. # Using `Tsit5()` I'm not too familiar with the idea of "stiff ODEs", but I'm guessing that if the problem works nicely without a stiff solver then we probably don't need a stiff solver? So let's try `Tsit5`: ```julia model = turingmodel(data, theta_fix, u0, problem, parnames, merge(solvsettings, (solver = Tsit5(), ))); ``` ```julia suite = make_turing_suite( model, var_info, adbackends=(ForwardDiffAD{40}(), ) ); benchmarks = run(suite, seconds=10); benchmarks ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(1.763 ms) "not_linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(1.782 ms) ```julia minimum(benchmarks["linked"]["ForwardDiffAD{40}()"]) ``` BenchmarkTools.TrialEstimate: time: 1.763 ms gctime: 0.000 ns (0.00%) memory: 3.03 MiB allocs: 5044 Aye, much faster indeed! # `ModelingToolkit.jl` We're not done yet though. What about computing the jacobian symbolically? ```julia @model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings) # Priors ψ ~ Beta(1,5) ρ ~ Uniform(0.0,1.0) β ~ Uniform(0.0,1.0) η ~ Uniform(0.0,1.0) ω ~ Uniform(1.0, 3.0*365.0) φ ~ Uniform(0.0,364.0) theta_est = [β,η,ω,φ] p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end] # Update problem and solve ODEs problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) sol_new = Array(solve( problem_new, solvsettings.solver, abstol=solvsettings.abstol, reltol=solvsettings.reltol, save_idxs=9, saveat=solvsettings.saveat, maxiters=solvsettings.maxiters )) incidence = sol_new[2:end] - sol_new[1:(end-1)] # avoid numerical instability issue incidence = max.(zero(eltype(incidence)), incidence) data ~ arraydist(@. NegativeBinomial2(ψ, incidence * ρ)) end # Symbolic problem de = modelingtoolkitize(problem) jac = eval(ModelingToolkit.generate_jacobian(de)[2]) f = ODEFunction(SEIRS2!, jac=jac) problem_sym = ODEProblem(f,u0,tspan,p) # Model with symbolic problem. model = turingmodel(data, theta_fix, u0, problem_sym, parnames, merge(solvsettings, (solver = Tsit5(), ))); # Execute once to ensure that it's working correctly. results = model(); ``` ```julia suite = make_turing_suite( model, var_info, adbackends=(ForwardDiffAD{40}(), ) ); benchmarks = run(suite, seconds=10); benchmarks ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(1.975 ms) "not_linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(1.971 ms) Seems like this is such a simple ODE that it didn't help much at all :shrug: # Immutable version Curious about an immutable version: ```julia function SEIRS2(u,p,t) # states (S1, E1, I1, R1, S2, E2, I2, R2) = u[1:8] N1 = S1 + E1 + I1 + R1 N2 = S2 + E2 + I2 + R2 N = N1 + N2 # params β = p.β η = p.η φ = p.φ ω = 1.0/p.ω μ = p.μ σ = p.σ γ1 = p.γ1 γ2 = γ1 / p.g2 δ2 = p.δ2 # FOI βeff = β * (1.0+η*cos(2.0*π*(t-φ)/365.0)) λ1 = βeff*(I1/N1 + I2/N2) λ2 = λ1 * δ2 # change in states du1 = μ*N - λ1*S1 - μ*S1 du2 = λ1*S1 - σ*E1 - μ*E1 du3 = σ*E1 - γ1*I1 - μ*I1 du4 = γ1*I1 - ω*R1 - μ*R1 du5 = ω*(R1 + R2) - λ2*S2 - μ*S2 du6 = λ2*S2 - σ*E2 - μ*E2 du7 = σ*E2 - γ2*I2 - μ*I2 du8 = γ2*I2 - ω*R2 - μ*R2 du9 = (σ*(E1 + E2)) # cumulative incidence return [du1, du2, du3, du4, du5, du6, du7, du8, du9] end # Immutable problem problem_immutable = ODEProblem{false}(SEIRS2,u0,tspan,p) ``` ODEProblem with uType Vector{Float64} and tType Float64. In-place: false timespan: (0.0, 7300.0) u0: 9-element Vector{Float64}: 200000.0 1000.0 1000.0 300000.0 500000.0 1000.0 1000.0 296000.0 2000.0 ```julia @model function turingmodel(data, theta_fix, u0, problem, parnames, solvsettings) # Priors ψ ~ Beta(1,5) ρ ~ Uniform(0.0,1.0) β ~ Uniform(0.0,1.0) η ~ Uniform(0.0,1.0) ω ~ Uniform(1.0, 3.0*365.0) φ ~ Uniform(0.0,364.0) theta_est = [β,η,ω,φ] p_new = @LArray vcat(theta_est, theta_fix) parnames[3:end] # Update problem and solve ODEs problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) sol_new = Array(solve( problem_new, solvsettings.solver, abstol=solvsettings.abstol, reltol=solvsettings.reltol, save_idxs=9, saveat=solvsettings.saveat, maxiters=solvsettings.maxiters )) incidence = sol_new[2:end] - sol_new[1:(end-1)] # avoid numerical instability issue incidence = max.(zero(eltype(incidence)), incidence) data ~ arraydist(@. NegativeBinomial2(ψ, incidence * ρ, check_args=false)) end # Model with symbolic problem. model = turingmodel(data, theta_fix, u0, problem_immutable, parnames, merge(solvsettings, (solver = Tsit5(), ))); # Execute once to ensure that it's working correctly. results = model(); ``` ```julia suite = make_turing_suite( model, var_info, adbackends=(ForwardDiffAD{40}(), ) ); benchmarks = run(suite, seconds=10); benchmarks ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(5.576 ms) "not_linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(5.538 ms) Much slower, though not unexpected. # Static problem With an immutable implementation we can use static arrays to see if that squeezes out any more perf: ```julia function SEIRS2_static(u,p,t) # states (S1, E1, I1, R1, S2, E2, I2, R2) = u[1:8] N1 = S1 + E1 + I1 + R1 N2 = S2 + E2 + I2 + R2 N = N1 + N2 # params β = p.β η = p.η φ = p.φ ω = 1.0/p.ω μ = p.μ σ = p.σ γ1 = p.γ1 γ2 = γ1 / p.g2 δ2 = p.δ2 # FOI βeff = β * (1.0+η*cos(2.0*π*(t-φ)/365.0)) λ1 = βeff*(I1/N1 + I2/N2) λ2 = λ1 * δ2 # change in states du1 = μ*N - λ1*S1 - μ*S1 du2 = λ1*S1 - σ*E1 - μ*E1 du3 = σ*E1 - γ1*I1 - μ*I1 du4 = γ1*I1 - ω*R1 - μ*R1 du5 = ω*(R1 + R2) - λ2*S2 - μ*S2 du6 = λ2*S2 - σ*E2 - μ*E2 du7 = σ*E2 - γ2*I2 - μ*I2 du8 = γ2*I2 - ω*R2 - μ*R2 du9 = (σ*(E1 + E2)) # cumulative incidence return @SVector [du1, du2, du3, du4, du5, du6, du7, du8, du9] end # Immutable problem p_static = (@SLVector parnames)([theta_est; theta_fix]) u0_static = SVector(u0...) problem_static = ODEProblem(SEIRS2_static,u0_static,tspan,p_static) sol = solve( problem_static, solvsettings.solver, abstol=solvsettings.abstol, reltol=solvsettings.reltol, isoutofdomain=(u,p,t)->any(x->x<0.0,u), save_idxs=9, saveat=solvsettings.saveat ); ``` ```julia @model function turingmodel_static(data, theta_fix, u0, problem, parnames, solvsettings) # Priors ψ ~ Beta(1,5) ρ ~ Uniform(0.0,1.0) β ~ Uniform(0.0,1.0) η ~ Uniform(0.0,1.0) ω ~ Uniform(1.0, 3.0*365.0) φ ~ Uniform(0.0,364.0) theta_est = SVector(β,η,ω,φ) p_new = (@SLVector (:β, :η, :ω, :φ, :σ, :μ, :δ2, :γ1, :g2))(vcat(theta_est, theta_fix)) # Update problem and solve ODEs problem_new = remake(problem, p=p_new, u0=eltype(p_new).(u0)) sol_new = Array(solve( problem_new, solvsettings.solver, abstol=solvsettings.abstol, reltol=solvsettings.reltol, save_idxs=9, saveat=solvsettings.saveat, maxiters=solvsettings.maxiters )) incidence = sol_new[2:end] - sol_new[1:(end-1)] # avoid numerical instability issue incidence = max.(zero(eltype(incidence)), incidence) data ~ arraydist(@. NegativeBinomial2(ψ, incidence * ρ, check_args=false)) end # Model with symbolic problem. model = turingmodel_static(data, SVector(theta_fix...), u0_static, problem_static, parnames, merge(solvsettings, (solver = Tsit5(), ))); # Execute once to ensure that it's working correctly. results = model(); ``` ```julia suite = make_turing_suite( model, var_info, adbackends=(ForwardDiffAD{40}(), ) ); benchmarks = run(suite, seconds=10); benchmarks ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(1.745 ms) "not_linked" => 1-element BenchmarkTools.BenchmarkGroup: tags: [] "ForwardDiffAD{40}()" => Trial(1.764 ms) ```julia minimum(benchmarks["linked"]["ForwardDiffAD{40}()"]) ``` BenchmarkTools.TrialEstimate: time: 1.745 ms gctime: 0.000 ns (0.00%) memory: 3.12 MiB allocs: 5007 Decent, but doesn't seem to be *strictly* better than non-static.
fkrauer commented 3 years ago

This is absolutely amazing, thanks so much @torfjelde, you saved my research project. This was a toy model, my actual model is a bit larger (17 compartments), but the code was basically the same. The time saved by changing the outofdomain statement plus type stability for the incidence is incredible.

PS: The 5000 burnin was a typo 🤪 sorry.

torfjelde commented 3 years ago

Great, really glad to hear!:)

Regarding dropping the outofdomain check:

  1. It's nice to include since it guarantees that you're sampling only from the space of parameters which are indeed valid for your model.
  2. Only using max does not always guarantee that you're sampling from the true posterior, but in most cases it does. It's fine to just use the max to ensure that you're not seeing any zeros if you never actually see any zeros during sampling. Essentially if the data is informative enough to stop us from choosing parameters that lead to invalid values, then we'll never reach the boundary of the space where we go from valid to invalid values (in this case, values below 0) and thus the max will never actually do anything. In this case, sampling with or without the max is equivalent and we're still sampling from the true posterior. But max can be useful to avoid issues in the adaption/burnin phase where we indeed can end up in bad regions.

So just keep that in mind :+1: What you could do is add the following line to the end of your model

return sol_new

and then after you're done sampling, just to check that nothing strange occurred, you can run generated_quantities(model, chain) to all the solutions used to generate chain. Then you can just check that none of the solutions that occured in chain were invalid.

But if you do end up dropping the outofdomain check, then you can actually use ReverseDiff with rdcache(true) + InterpolatingAdjoint(autovecjac=ReverseDiffVJP()) so that might be useful once you go to higher dims (though I think 17 compartments is still something ForwardDiff can handle easily).

PS: The 5000 burnin was a typo :zany_face: sorry.

Haha, no worries at all. I should have thought of that immediately. I was really confused as to why it was taking so long on your end, and I had to actually run a complete sample and noticed the iterations in the chain said 5001:5500, haha. Oh well, I don't think either of us will make that mistake again :sweaty_smile: