Performance dip when using ForwardDiff compared to Turing

burtonjosh commented 1 year ago

I've noticed that there's a performance dip when using ForwardDiff with a model defined in TuringGLM, compared to defining the model directly in Turing. I've set up a MWE to show this.

First I set up 4 models, two in TuringGLM (with and without custom priors), and two in Turing, with the default and custom priors given to the TuringGLM models.

using Turing, TuringGLM, TuringBenchmarking, BenchmarkTools
using ReverseDiff: ReverseDiff
using CSV, DataFrames, LinearAlgebra

hibbs_df = CSV.read(
    download("https://raw.githubusercontent.com/avehtari/ROS-Examples/master/ElectionsEconomy/data/hibbs.dat"),
    DataFrame
);

# TuringGLM model
f = @formula(vote ~ growth)
m_glm = turing_model(f, hibbs_df)

# TuringGLM model with custom priors
priors = CustomPrior(Normal(0, 10), Normal(52, 14), nothing)
m_glm_custom = turing_model(f, hibbs_df; priors=priors)

# extract data for Turing models
y = TuringGLM.data_response(f, hibbs_df)
X = TuringGLM.data_fixed_effects(f, hibbs_df)

# model with default priors
@model function regression_default(X, y; residual=std(y))
    α ~ 50.755 + TDist(3.0)*6.071256084780443
    β ~ filldist(TDist(3.0), size(X,2))
    σ ~ Exponential(residual)

    y ~ MvNormal(α .+ X*β, σ^2*I)
end

m_turing = regression_default(X, y; residual=std(y))

# model with custom priors
@model function regression_custom(X, y; residual=std(y))
    α ~ Normal(52, 14)
    β ~ filldist(Normal(0, 10), size(X,2))
    σ ~ Exponential(residual)

    y ~ MvNormal(α .+ X*β, σ^2*I)
end

m_turing_custom = regression_custom(X, y; residual=std(y))

Then using TuringBenchmarking.jl, I benchmark each of the four models with both Forward and Reverse diff backends:

The results of the benchmark are shown in the table below. You can see that for Reversediff the benchmarks are the same, but with ForwardDiff TuringGLM is ~20-30% slower than Turing (I've included the full results below).

Model	ForwardDiff, linked (time, μs)	ReverseDiff, linked (time, μs)	ForwardDiff, not linked (time, μs)	ReverseDiff, not linked (time, μs)
TuringGLM (default prior)	3.967	2.772	3.976	1.990
Turing (default prior)	3.046	2.676	3.059	1.931
TuringGLM (custom prior)	4.013	2.102	3.905	1.868
Turing (custom prior	2.776	1.986	2.827	1.829

Click here for in detail output

TuringGLM model 1 (default priors) ``` suite_glm = TuringBenchmarking.make_turing_suite( m_glm, adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()] ) run(suite_glm) ``` Output: ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(2.882 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(2.772 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.967 μs) "not_linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(2.836 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.990 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.976 μs) ``` Turing model 1 (default priors) ``` suite_turing = TuringBenchmarking.make_turing_suite( m_turing, adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()] ) run(suite_turing) ``` Output: ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(1.256 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(2.676 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.046 μs) "not_linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(1.207 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.931 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.059 μs) ``` TuringGLM model 2 (custom priors) ``` suite_glm_custom = TuringBenchmarking.make_turing_suite( m_glm_custom, adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()] ) run(suite_glm_custom) ``` Output: ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(2.724 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(2.102 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(4.013 μs) "not_linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(2.737 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.868 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(3.905 μs) ``` Turing model 2 (custom priors) ``` suite_turing_custom = TuringBenchmarking.make_turing_suite( m_turing_custom, adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()] ) run(suite_turing_custom) ``` Output: ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(1.176 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.986 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(2.776 μs) "not_linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(1.160 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.829 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(2.827 μs) ```

storopoli commented 1 year ago

This is really strange. Any hints why the degraded performance? TuringGLM only creates the model and the data to you. Everything else is delegated to Turing itself.

burtonjosh commented 1 year ago

The only difference that I could think of was that TuringGLM uses the CustomPrior struct, so I tried to emulate this by defining my own and using that in a Turing model:

abstract type TuringPrior end

struct CustomTuringPrior <: TuringPrior
    predictors
    intercept
    auxiliary
end

@model function regression_custom_prior(X, y, priors; residual=std(y))
    α ~ priors.intercept
    β ~ filldist(priors.predictors, size(X,2))
    σ ~ Exponential(residual)

    y ~ MvNormal(α .+ X*β, σ^2*I)
end

turing_prior = CustomTuringPrior(Normal(0, 10), Normal(52, 14), nothing)

m_turing_prior = regression_custom_prior(X, y, turing_prior; residual=std(y))

suite_turing_prior = TuringBenchmarking.make_turing_suite(
    m_turing_prior,
    adbackends = [TuringBenchmarking.ForwardDiffAD{40}(), TuringBenchmarking.ReverseDiffAD{true}()]
)
run(suite_turing_prior)

The results from this are	Model	ForwardDiff, linked (time, μs)	ReverseDiff, linked (time, μs)	ForwardDiff, not linked (time, μs)	ReverseDiff, not linked (time, μs)
Turing model 3 (custom prior struct)	4.203	1.942	4.173	1.710

which shows the same slowdown as the TuringGLM model benchmarks. So it looks like it's to do with this, but I don't know how.

Click here for in detail output

Turing model 3 (custom prior struct) ``` 2-element BenchmarkTools.BenchmarkGroup: tags: [] "linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(2.953 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.942 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(4.203 μs) "not_linked" => 3-element BenchmarkTools.BenchmarkGroup: tags: [] "evaluation" => Trial(2.975 μs) "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.710 μs) "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(4.173 μs) ```

storopoli commented 1 year ago

Yeah that might a little bit of overhead.

TuringLang / TuringGLM.jl

Performance dip when using ForwardDiff compared to Turing #81