TuringLang / Turing.jl

Bayesian inference with probabilistic programming.
https://turinglang.org
MIT License
2.04k stars 219 forks source link

Performance regression for BernoulliLogit #1934

Open torfjelde opened 1 year ago

torfjelde commented 1 year ago

I was just playing around a bit with https://github.com/torfjelde/TuringBenchmarking.jl and noticed a sudden change in the runtime described in the README (the example model is suddenly 16x slower for gradient evaluation for ReverseDiff with compiled mode).

I eventually narrowed it down to #1892 being the cause, i.e. the performance of the following model:

@model function irt(y, i, p; I = maximum(i), P = maximum(p))
    theta ~ filldist(Normal(), P)
    beta ~ filldist(Normal(), I)
    Turing.@addlogprob! sum(logpdf.(BernoulliLogit.(theta[p] - beta[i]), y))

    return (; theta, beta)
end

absolutely tanks for ReverseDiff when we use the implementation of BernoulliLogit from Distributions.jl :confused:

On Turing@0.21.12:

┌ Info: Turing.jl
│   run(suite) =
│    2-element BenchmarkTools.BenchmarkGroup:
│      tags: []
│      "linked" => 3-element BenchmarkTools.BenchmarkGroup:
│         tags: []
│         "evaluation" => Trial(1.333 ms)
│         "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.752 ms)
│         "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(174.759 ms)
│      "not_linked" => 3-element BenchmarkTools.BenchmarkGroup:
│         tags: []
│         "evaluation" => Trial(1.339 ms)
│         "Turing.Essential.ReverseDiffAD{true}()" => Trial(1.796 ms)
└         "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(169.376 ms)

while on Turing@0.21.13

┌ Info: Turing.jl
│   run(suite) =
│    2-element BenchmarkTools.BenchmarkGroup:
│      tags: []
│      "linked" => 3-element BenchmarkTools.BenchmarkGroup:
│         tags: []
│         "evaluation" => Trial(554.568 μs)
│         "Turing.Essential.ReverseDiffAD{true}()" => Trial(16.418 ms)
│         "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(140.508 ms)
│      "not_linked" => 3-element BenchmarkTools.BenchmarkGroup:
│         tags: []
│         "evaluation" => Trial(554.415 μs)
│         "Turing.Essential.ReverseDiffAD{true}()" => Trial(16.445 ms)
└         "Turing.Essential.ForwardDiffAD{40, true}()" => Trial(139.849 ms)

Given that evaluation and ForwardDiff is faster in the latter case, it's clearly an "issue" with ReverseDiff, but at the same time this is such a significant perf hit that it makes me a bit uncomfortable to just "leave it in" there :confused:

Thoughts? @devmotion

tansongchen commented 1 year ago

Thanks for providing this PR and suggestions. It seems that handling generic inner type for forward mode AD (and similarly for reverse mode) more or less involves some SCT (at least some tweaks with Cassette). I will probably first do something with arrays before getting more general...