Closed xukai92 closed 6 years ago
That would be great! ReverseDiff is the only package I have gotten to work with my seemingly exotic nested differentiation use case. (ReverseDiff(ForwardDiff) mixed). For reference, almost no packages support differentiation through the following (simplified)
f(w,x) = NeuralNetwork(w,x)
# Return closure over x
function loss(w,x)
∇xf(x) = jacobian(x->f(w,x), x) # This closes over latest w
w -> cost(w) + norm(∇xf(x)) # Loss function contains nested differentiation of the model with respect to the input
end
l = loss(w,x)
gradient(l,w) # Good luck with this
I solve it by taking the jacobian with ForwardDiff and the outer gradient with ReverseDiff. It is very slow (creating and compiling the reverse tape can fill my 16GB of ram), but works.
It also breaks Klara.jl c.f. https://github.com/JuliaStats/Klara.jl/issues/174
I have a branch that's passing tests on v0.7 now with only a few deprecation warnings left to tackle, but they'll require some work (the A_mul_B
deprecations). I also have not yet updated to the new broadcast machinery, and expect it to be nontrivial.
Won't have a lot of time to work on it in the next few days, but hopefully will be able to get a PR in at the end of the week.
ReverseDiff isn't actively developed anymore now that there are so many similar AD packages and I'm trying to work on some next-gen stuff, but I can probably at least update it for v0.7/1.0 so people can keep using it until Capstan is ready. I'll try to get to it by next week sometime.
In the meantime, one could try Nabla.jl, AutoGrad.jl, Flux.jl, or XGrad.jl.