Open burtonjosh opened 1 year ago
This is really strange. Any hints why the degraded performance?
TuringGLM
only creates the model and the data to you.
Everything else is delegated to Turing
itself.
The only difference that I could think of was that TuringGLM uses the CustomPrior
struct, so I tried to emulate this by defining my own and using that in a Turing model:
abstract type TuringPrior end
struct CustomTuringPrior <: TuringPrior
predictors
intercept
auxiliary
end
@model function regression_custom_prior(X, y, priors; residual=std(y))
α ~ priors.intercept
β ~ filldist(priors.predictors, size(X,2))
σ ~ Exponential(residual)
y ~ MvNormal(α .+ X*β, σ^2*I)
end
turing_prior = CustomTuringPrior(Normal(0, 10), Normal(52, 14), nothing)
m_turing_prior = regression_custom_prior(X, y, turing_prior; residual=std(y))
suite_turing_prior = TuringBenchmarking.make_turing_suite(
m_turing_prior,
adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()]
)
run(suite_turing_prior)
The results from this are | Model | ForwardDiff, linked (time, μs) | ReverseDiff, linked (time, μs) | ForwardDiff, not linked (time, μs) | ReverseDiff, not linked (time, μs) |
---|---|---|---|---|---|
Turing model 3 (custom prior struct) | 4.203 | 1.942 | 4.173 | 1.710 |
which shows the same slowdown as the TuringGLM model benchmarks. So it looks like it's to do with this, but I don't know how.
Turing model 3 (custom prior struct) ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(2.953 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.942 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(4.203 μs) "not_linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(2.975 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.710 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(4.173 μs) ```
Yeah that might a little bit of overhead.
I've noticed that there's a performance dip when using ForwardDiff with a model defined in TuringGLM, compared to defining the model directly in Turing. I've set up a MWE to show this.
First I set up 4 models, two in TuringGLM (with and without custom priors), and two in Turing, with the default and custom priors given to the TuringGLM models.
Then using TuringBenchmarking.jl, I benchmark each of the four models with both Forward and Reverse diff backends:
The results of the benchmark are shown in the table below. You can see that for Reversediff the benchmarks are the same, but with ForwardDiff TuringGLM is ~20-30% slower than Turing (I've included the full results below).
Click here for in detail output
TuringGLM model 1 (default priors) ``` suite_glm = TuringBenchmarking.make_turing_suite( m_glm, adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()] ) run(suite_glm) ``` Output: ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(2.882 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(2.772 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.967 μs) "not_linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(2.836 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.990 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.976 μs) ``` Turing model 1 (default priors) ``` suite_turing = TuringBenchmarking.make_turing_suite( m_turing, adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()] ) run(suite_turing) ``` Output: ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(1.256 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(2.676 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.046 μs) "not_linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(1.207 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.931 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.059 μs) ``` TuringGLM model 2 (custom priors) ``` suite_glm_custom = TuringBenchmarking.make_turing_suite( m_glm_custom, adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()] ) run(suite_glm_custom) ``` Output: ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(2.724 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(2.102 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(4.013 μs) "not_linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(2.737 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.868 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.905 μs) ``` Turing model 2 (custom priors) ``` suite_turing_custom = TuringBenchmarking.make_turing_suite( m_turing_custom, adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()] ) run(suite_turing_custom) ``` Output: ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(1.176 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.986 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(2.776 μs) "not_linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(1.160 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.829 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(2.827 μs) ```