Closed wupeifan closed 4 years ago
Let me know if you need help here. Is it possible to calculate the gradient and get the value simultaneously with your setup? That would help performance.
I think in this setting the gradient is Turing.gradient_logp
. Therefore, you can wrap the value and the gradient together. @mohamed82008 also pointed out this https://github.com/JuliaNLSolvers/NLSolversBase.jl/blob/9e89526817d5932489d8fdfead5f6e537c1291a1/src/objective_types/oncedifferentiable.jl#L172 in Slack discussion thread.
I guess the branch is not needed? It seems one could just define
function (f::OptimLogDensity)(F, G, H, x)
if G !== nothing
...
end
if H !== nothing
...
end
if F !== nothing
return ...
end
nothing
end
and then call Optim.optimize(Optim.only_fgh!(f), ...)
, similar to https://github.com/JuliaNLSolvers/Optim.jl/blob/7b660484724755ee2b306dd3ceed13d6633067ae/test/multivariate/optimize/interface.jl#L73-L78.
Yeah, probably. Thanks! I'm swamped with more urgent stuff so I might not be able to make a PR myself for this one... Maybe wait for Cameron after he finishes his qualifier, or maybe I'll revisit this in a couple of days...
@mohamed82008 what would be a quick way to get the Hessian as well? I'm switching things up a little to use gradient_logp
to get the gradient too, but I'm wondering if I should add a method like hessian_logp
. Anyone have thoughts on that?
Hessians are usually computationally expensive and that’s why the original Newton method is not preferred. I don’t think it is that necessary to add Hessians for optimization per se but maybe other people have different ideas.
That said, I think it would be great if the information matrix can be provided...
Well, if we're going to use the Optim.only_fgh!
method (or the non-Hessian variant Optim.only_fg!
) we should probably consider extending support for Hessians in an efficient way in case someone wants to use Newton
or the other Hessian-based methods.
We've got support already for the information matrix, but I'm not sure if it's finite-difference based or not (I think it is) -- you can do it with
using StatsBase
m = optimize(model, MLE())
StatsBase.informationmatrix(m)
in case someone wants to use Newton or the other Hessian-based methods.
For small scale problems yeah. I don't know, maybe other people have better ideas.
I'm wondering if I should add a method like hessian_logp.
I think that's reasonable, but perhaps as a separate PR. For now, we can say that second-order methods are not supported if H !== nothing.
We've got support already for the information matrix, but I'm not sure if it's finite-difference based or not (I think it is) -- you can do it with
using StatsBase
m = optimize(model, MLE()) StatsBase.informationmatrix(m)
The information matrix actually spits an error. I think it calls ForwardDiff automatically. @cpfiffer
Welp, all the more reason to get the actual Hessian stuff built in too.
@mohamed82008 what would be a quick way to get the Hessian as well? I'm switching things up a little to use gradient_logp to get the gradient too, but I'm wondering if I should add a method like hessian_logp. Anyone have thoughts on that?
Just do forward mode over whatever reverse mode is chosen.
Just do forward mode over whatever reverse mode is chosen.
It could be possible that under custom adjoints the users cannot provide a Hessian in some of the intermediate steps, so the forward mode might not proceed. I think it's not that trivial when custom adjoints are provided; however, what you said should be feasible if everything is taken care of by an AD backend...
Thanks for implementing this feature request! I just experimented with the new codes and they work perfectly.
Did you get a noticeable speed up of any kind? I'd expect the AD gradient to be a lot faster and more precise in general.
On Thu, Aug 20, 2020 at 10:24 AM Peifan Wu notifications@github.com wrote:
Closed #1365 https://github.com/TuringLang/Turing.jl/issues/1365.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TuringLang/Turing.jl/issues/1365#event-3676328636, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADHITXRDZNTZA3ICH64VSTSBVL3VANCNFSM4PLEYVDA .
@cpfiffer It's rather a feasibility issue for me as previously I can't run gradient-based methods. For sure it will be faster than the previous simplex method
In estimating
MLE
andMAP
, the routines fromOptim.jl
are called. However, even for gradient-based methods, Optim.jl only supportForwardDiff
as an AD backend, and apply finite differences otherwise. Therefore, it makes sense to use the AD backend in Turing (Turing.setadbackend
) to define the gradient function and feed it to the optimizer.Basically, we replace https://github.com/TuringLang/Turing.jl/blob/master/src/modes/ModeEstimation.jl#L383 by a structure like
In the first branch, you will need to pass the original function and another that computes the function value and gradient-based on
Turing.gradient_logp
. An example is https://github.com/JuliaNLSolvers/Optim.jl/blob/7b660484724755ee2b306dd3ceed13d6633067ae/test/multivariate/optimize/interface.jl#L54