Hessian-vector products

jeff-regier commented 7 years ago

Have you all considered adding optimization methods that make use of Hessian-vector products, but that don't explicitly form Hessians? I've been thinking about writing a version of newton_trust_region that does that, essentially using conjugate gradient iterations to multiply the gradient by the inverse Hessian. Is that something you'd be interested in including in Optim.jl?

jeff-regier/Celeste.jl#380

mlubin commented 7 years ago

@dpo and @abelsiqueira have been working on this, not sure if the code is public.

pkofod commented 7 years ago

Well, my initial reaction would be: sure. I'm still very happy about the "old" trust region (I've been using it in my personal projects), and you're doing a good job keeping it up to date. That last part is relevant, because even if we have more active people on here than earlier, new solvers is kind of a... tricky issue. On one had we might be tempted to say "let's implement everything", on the other hand we need someone to maintain all that code. So if you're willing to make a prototype, test, and maintain it, I'm all for it.

jeff-regier commented 7 years ago

Sounds good, I think I'm up for it. What kind of interface do you suggest? Would we add a field named something like hv to TwiceDifferentiableFunction, so that a user may specify a function that returns the product of the Hessian and a vector?

If the user doesn't specify an hv field, it'd be nice if it was automatically populated by ForwardDiff.jl: calculating the gradient with dual numbers with perturbations set to v (the user-specified vector) would, I think, be a pretty efficient way to compute the product. Is that something you've implemented, perhaps in code that isn't public yet, or that you have experience with?

Also, any plans to use @jrevels's ReverseDiff.jl library to automatically populate the gradient function, if a user doesn't specify one? That, in combination with ForwardDiff.jl for the Hessian-vector product, would be really useful for us at Celeste.jl, and probably for any number of other projects too.

abelsiqueira commented 7 years ago

Hello, thanks for the mention, Miles.

We have implemented a Matrix-Free Trust-Region Newton Method for unconstrained minimization, i.e., using Hessian-Vector products. We haven't made a released yet, but it is usable: https://github.com/JuliaSmoothOptimizers/Optimize.jl/blob/master/src/solver/trunk.jl. The package uses NLPModels.jl, LinearOperators.jl and Krylov.jl to implement this. If you'd rather implement this here, you should consider at least LinearOperators and Krylov.

Both me and Dominique are a little swamped at the moment, but implementing competitive Matrix-Free methods for Large Scale problems is one of our goals.

anriseth commented 7 years ago

Also, any plans to use @jrevels's ReverseDiff.jl library to automatically populate the gradient function, if a user doesn't specify one?

I was hoping to take a look at ReverseDiff AD at some point

pkofod commented 7 years ago

Also, any plans to use @jrevels's ReverseDiff.jl library to automatically populate the gradient function, if a user doesn't specify one? That, in combination with ForwardDiff.jl for the Hessian-vector product, would be really useful for us at Celeste.jl, and probably for any number of other projects too.

@jeff-regier we've got basic ReverseDiff support now if you dare try master. Do note, that there are quite a few breaking changes, soo...

jeff-regier commented 7 years ago

That's good. I've got an implementation of Hessian-free trust-region optimization now over at Celeste.jl, on the jcr/cg2 branch:

https://github.com/jeff-regier/Celeste.jl/blob/jr/cg2/src/cg_trust_region.jl

We're still testing it. The code is solid but I'd like to build in support for preconditioning still.

pkofod commented 7 years ago

Are you aware of our preconditioning code in optim ?

jeff-regier commented 7 years ago

I wasn't, but now I see precon.jl. Thanks for pointing it out. I'll try to stick with that preconditioner interface so it's easy to merge.

pkofod commented 7 years ago

Great, it's the work of @cortner who's also been using it in his work, but of course suggestions and improvements are still welcome.

cortner commented 7 years ago

Maybe this is obvious to you: in the TR context, the preconditioner doesn't just give you a PCG method, but also specifies the TR topology. Then, when you git the TR boundary you start solving a generalised eval problem instead of the standard eval problem.

JuliaNLSolvers / Optim.jl

Hessian-vector products #356