Open torfjelde opened 1 year ago
Thanks for providing this PR and suggestions. It seems that handling generic inner type for forward mode AD (and similarly for reverse mode) more or less involves some SCT (at least some tweaks with Cassette). I will probably first do something with arrays before getting more general...
I was just playing around a bit with https://github.com/torfjelde/TuringBenchmarking.jl and noticed a sudden change in the runtime described in the README (the
example
model is suddenly 16x slower for gradient evaluation for ReverseDiff with compiled mode).I eventually narrowed it down to #1892 being the cause, i.e. the performance of the following model:
absolutely tanks for ReverseDiff when we use the implementation of
BernoulliLogit
from Distributions.jl :confused:On Turing@0.21.12:
while on Turing@0.21.13
Given that evaluation and ForwardDiff is faster in the latter case, it's clearly an "issue" with ReverseDiff, but at the same time this is such a significant perf hit that it makes me a bit uncomfortable to just "leave it in" there :confused:
Thoughts? @devmotion