JuliaNLSolvers / Optim.jl

Optimization functions for Julia
Other
1.11k stars 216 forks source link

Maximum Step Size #34

Closed SamChill closed 8 years ago

SamChill commented 11 years ago

I'm not sure where the best place to discuss Optim is so I'll post this here.

In my field, computational chemistry, it is common to enforce a maximum step size constraint at each optimization step, when working with atomic systems. This is very important when the gradient is large because a large step can be taken without this constraint. A large step will often lead away from the current basin of attraction to a different one. Finding the minimum associated with the current geometry is often why the optimization is being performed. Also, if the initial force is large enough, all of the atoms might be placed so far apart that the energy (function value) goes to zero. This certainly is a minimum, but it is a trivial one.

I would like to know if anyone has any thoughts on adding a maximum step size to Optim. The way that it is typically implemented would be to calculate the length of the proposed step size and if it is larger than the maximum, set it to the maximum.

One reason that a maximum step size makes sense in atomic systems is that the second derivates very rapidly with changes in bond lengths. This means that curvature information learned at the current position should only be used locally. For example, a typical maximum step size might be between 0.1 and 0.3 Angstroms (several percent of an equilibrium bond length).

I am currently writing a paper that benchmarks various optimization algorithms and implementations in atomic systems and I would like to include Optim, however, as it currently stands, Optim immediently pushes all of the atoms extremely far apart in the first step, due to the large initial gradient.

SamChill commented 11 years ago

As a further question are their any line searches that attempt to find the same minimum that corresponds to taking infinitesimally small steps along the negative gradient? This is normally how a basin of attraction is defined. I imagine that a backtracking line search that has a maximum step size that is smaller than the width of a typical basin might work okay.

I typically use optimization algorithms without a sophisticated line search. Either directly using the LBFGS search direction and length or conjugate gradients with a single newton's method step along the descent direction. Both of which with a maximum step size.

I'd like to see if a fancy line search can improve performance for the problems I am interested in, but as it stands I can't get any of the available line searches in Optim to work. Each line search seems to have options that should be able to be changed, but I don't see any way to do that. The line search seems to be undocumented.

johnmyleswhite commented 11 years ago

I'd certainly like to include max step sizes as an option. This will take some time to do, though, as the line search code is fairly complex.

I'm not sure about what infinitesimally small steps along the gradient means in floating point. You should use a constant-step gradient descent algorithm with step sizes equal to the the floating point epsilon, but this seems like a bad idea for both convergence and numerical stability. I suspect I'm missing something.

The line search is definitely undocumented. Some of the options are also not available because we haven't done enough to generalize our line search framework to discuss all of the different options in an easy way.

If you'd like to experiment with alternative line searches, you can copy the code for backtracking line search and then modify it to your taste. As long as you maintain the existing interface, you can pass in your new line search function as an argument to optimize.

timholy commented 11 years ago

linesearch_hz already has the ability to specify a maximum step size (it's needed for fminbox).

SamChill commented 11 years ago

I think I see a way forward at least for my own personal testing.

I'll clarify what I meant about "infinitesimally small steps along the negative gradient". I have the view that the goal of a local optimization routine should be to try find the same minimum that would be found with gradient descent in the limit of the step size going to zero. This of course is only a theoretical definition, not a real approach to optimization.

Here is a recent paper that maps out which minima are reached from different starting coordinates for different optimization algorithms in atomistic systems. This sort of study is interesting because some algorithms try to associate points in space with their associated minimum. However, which minimum will be reached from a given initial point is dependent on many different parameters.

timholy commented 10 years ago

Since hz_linesearch is the default line search algorithm, and it includes alphamax as an option, can this be closed?

pkofod commented 8 years ago

As mentioned, the functionality is already here, so I am closing this.