Search direction is not a direction of descent

oxinabox commented 9 years ago

I am getting an error that the search direction is not a direction of descent, it suggests that "this error may indicate that user-provided derivatives are inaccurate." however that can not be the case, as I did not provide any. (I was lazy)

Error is:


LoadError: Search direction is not a direction of descent; this error may indicate that user-provided derivatives are inaccurate. (dphia = 0.050360; dphib = 0.378584)
while loading In[7], in expression starting on line 5

 in error at ./error.jl:21
 [inlined code] from printf.jl:145
 in secant2! at /home/wheel/oxinabox/.julia/v0.5/Optim/src/linesearch/hz_linesearch.jl:402
 in hz_linesearch! at /home/wheel/oxinabox/.julia/v0.5/Optim/src/linesearch/hz_linesearch.jl:321
 in hz_linesearch! at /home/wheel/oxinabox/.julia/v0.5/Optim/src/linesearch/hz_linesearch.jl:176 (repeats 10 times)
 in __l_bfgs#9__ at /home/wheel/oxinabox/.julia/v0.5/Optim/src/l_bfgs.jl:169
 in optimize at /home/wheel/oxinabox/.julia/v0.5/Optim/src/optimize.jl:508

Full MWE on: http://nbviewer.ipython.org/gist/oxinabox/4f51dba39154fbb2edbd

Error is in Cell 7. Executing this requires the R-Datasets package

I can't run this in Julia 0.4 or 0.3 at the moment as both are not compiling for me.

I suspect this might be linked to #121 as both are to do with batch-learning over constant training data. However, in this case it is (more) certain that it can't be user error in the definition of the derivative since it is being derived automatically.

oxinabox commented 9 years ago

New better MWE. Without having to have a batch learning senario. Same error on the rosenbrock function, with and without gradient provided, if it is started from [0.046497946064945506, 0.6103779534361964]

MWE: (also edited into http://nbviewer.ipython.org/gist/oxinabox/4f51dba39154fbb2edbd)

using Optim

function rosenbrock(x::Vector)
    return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end

function rosenbrock_gradient!(x::Vector, storage::Vector)
    storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
    storage[2] = 200.0 * (x[2] - x[1]^2)
end

fr = rosenbrock
gr! = rosenbrock_gradient!
t=[0.046497946064945506,0.6103779534361964]

optimize(fr, t, method = :l_bfgs) #Errors out -- LoadError: Search direction is not a direction of descent; 
optimize(fr, gr!, t, method = :l_bfgs) #Also Errors out -- LoadError: Search direction is not a direction of descent;

christianhaargaard commented 9 years ago

Using the minimal working example from oxinax above (the New better MWE) I get the following error from optimize(fr, t, method = :l_bfgs)

ERROR: Search direction is not a direction of descent; this error may indicate that user-provided derivatives are inaccurate. (dphia = 0.000014; dphib = 0.000015) in error at error.jl:21 in secant2! at /Users/christian/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:143 in hz_linesearch! at /Users/christian/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:321 in hz_linesearch! at /Users/christian/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:176 (repeats 10 times) in l_bfgs at /Users/christian/.julia/v0.3/Optim/src/l_bfgs.jl:169 in optimize at /Users/christian/.julia/v0.3/Optim/src/optimize.jl:521 in include at /usr/local/Cellar/julia/0.3.11/lib/julia/sys.dylib in include_from_node1 at /usr/local/Cellar/julia/0.3.11/lib/julia/sys.dylib while loading /Users/christian/tmp/julia-mwe.jl, in expression starting on line 16

and ERROR: Search direction is not a direction of descent; this error may indicate that user-provided derivatives are inaccurate. (dphia = 0.000014; dphib = 0.000015) in error at error.jl:21 in secant2! at /Users/christian/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:143 in hz_linesearch! at /Users/christian/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:321 in hz_linesearch! at /Users/christian/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:176 (repeats 10 times) in l_bfgs at /Users/christian/.julia/v0.3/Optim/src/l_bfgs.jl:169 in optimize at /Users/christian/.julia/v0.3/Optim/src/optimize.jl:442 in include at /usr/local/Cellar/julia/0.3.11/lib/julia/sys.dylib in include_from_node1 at /usr/local/Cellar/julia/0.3.11/lib/julia/sys.dylib while loading /Users/christian/tmp/julia-mwe.jl, in expression starting on line 17

If I try the second command, optimize(fr, gr!, t, method = :l_bfgs).

I'm on Julia 0.3.11 on OSX.

johnmyleswhite commented 9 years ago

Thanks for finding, to my knowledge, the very first reproducible case where the problem is in Optim. You probably deserve some sort of reward for this since it gives us a realistic chance to improve the hz_linesearch! code.

Here's some more details from a little digging I did based on your test case:

julia> using Optim

julia> function rosenbrock(x::Vector)
           return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
       end
rosenbrock (generic function with 1 method)

julia> function rosenbrock_gradient!(x::Vector, storage::Vector)
           storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
           storage[2] = 200.0 * (x[2] - x[1]^2)
       end
rosenbrock_gradient! (generic function with 1 method)

julia> fr = rosenbrock
rosenbrock (generic function with 1 method)

julia> gr! = rosenbrock_gradient!
rosenbrock_gradient! (generic function with 1 method)

julia> t=[0.046497946064945506,0.6103779534361964]
2-element Array{Float64,1}:
 0.0464979
 0.610378 

julia> optimize(fr, t, method = :l_bfgs, show_trace = true, extended_trace = true)
Iter     Function value   Gradient norm 
     0     3.790182e+01     1.216432e+02
 * Current step size: 1.0
 * g(x): [-13.21932004832231,121.64317888993415]
 * x: [0.046497946064945506,0.6103779534361964]
     1     3.034185e+01     1.085285e+02
 * Current step size: 0.0005332665541292081
 * g(x): [-13.515733902722369,108.52847881068213]
 * x: [0.0535473673150455,0.5455097145962384]
     2     6.585835e-01     2.019665e+01
 * Current step size: 0.0373121785128398
 * g(x): [-20.19665188770014,14.752143697001937]
 * x: [0.6615932867760497,0.5114663955917655]
     3     9.177974e-02     2.798717e+00
 * Current step size: 0.00010847223404751265
 * g(x): [2.798717471476102,-2.330776560657305]
 * x: [0.7203601582898177,0.5072648748481026]
     4     8.223303e-02     3.686765e-01
 * Current step size: 0.9053217218744186
 * g(x): [-0.04599511124587593,-0.3686764961556951]
 * x: [0.7138300700625371,0.5077099864447171]
ERROR: Search direction is not a direction of descent; this error may indicate that user-provided derivatives are inaccurate. (dphia = 0.000014; dphib = 0.000015)
 in error at ./error.jl:21
 [inlined code] from printf.jl:145
 in secant2! at /Users/johnmyleswhite/.julia/v0.4/Optim/src/linesearch/hz_linesearch.jl:145
 in hz_linesearch! at /Users/johnmyleswhite/.julia/v0.4/Optim/src/linesearch/hz_linesearch.jl:321
 in hz_linesearch! at /Users/johnmyleswhite/.julia/v0.4/Optim/src/linesearch/hz_linesearch.jl:176 (repeats 10 times)
 in l_bfgs at /Users/johnmyleswhite/.julia/v0.4/Optim/src/l_bfgs.jl:169
 in optimize at /Users/johnmyleswhite/.julia/v0.4/Optim/src/optimize.jl:508

julia> x = [0.7138300700625371,0.5077099864447171]
2-element Array{Float64,1}:
 0.71383
 0.50771

julia> g = [-0.04599511124587593,-0.3686764961556951]
2-element Array{Float64,1}:
 -0.0459951
 -0.368676 

julia> rosenbrock(x)
0.0822330346974532

julia> rosenbrock(x - 0.1 * g)
0.16014576878251033

julia> rosenbrock(x - 0.01 * g)
0.08177097427262547

julia> optimize(fr, t, method = :l_bfgs, show_trace = true, extended_trace = true, linesearch! = Optim.mt_linesearch!)
Iter     Function value   Gradient norm 
     0     3.790182e+01     1.216432e+02
 * Current step size: 1.0
 * g(x): [-13.21932004832231,121.64317888993415]
 * x: [0.046497946064945506,0.6103779534361964]
     1     9.116925e-01     6.805628e+00
 * Current step size: 0.004642385216210363
 * g(x): [-3.25247285644434,6.805628418120389]
 * x: [0.10786712202563026,0.0456634581047336]
     2     6.087523e-01     1.999803e+00
 * Current step size: 0.042883281352515644
 * g(x): [-1.9998034007591687,0.9995559691498415]
 * x: [0.22137649455174352,0.054005332185795674]
     3     5.504648e-01     5.361268e+00
 * Current step size: 0.12114673063410149
 * g(x): [1.9209050791536908,-5.361267635652343]
 * x: [0.3081858286124246,0.06817216677927206]
     4     5.223893e-01     2.581103e+00
 * Current step size: 1.0
 * g(x): [0.06880402689466916,-2.581102670998708]
 * x: [0.28885014096426803,0.07052889058011375]
     5     4.916924e-01     1.817491e+00
 * Current step size: 1.0
 * g(x): [-0.2829891706267437,-1.817490899121915]
 * x: [0.30470565591213417,0.08375808224923581]
     6     3.393769e-01     4.927059e+00
 * Current step size: 1.0
 * g(x): [3.596231115305139,-4.927059012924018]
 * x: [0.4720917318517486,0.19823530821817847]
     7     2.733593e-01     1.759482e+00
 * Current step size: 1.0
 * g(x): [0.6745822033549768,-1.7594816487326084]
 * x: [0.484616775308535,0.2260560106667956]
     8     1.930228e-01     5.152367e+00
 * Current step size: 1.0
 * g(x): [5.152366982141452,-4.69021520328036]
 * x: [0.6284794790135224,0.371535379524615]
     9     1.363312e-01     5.020847e-01
 * Current step size: 1.0
 * g(x): [-0.10249496316464486,-0.502084665860697]
 * x: [0.6316238593922736,0.3964382764242786]
    10     8.458441e-02     1.802161e+00
 * Current step size: 1.0
 * g(x): [1.8021611218079159,-1.6370149990873493]
 * x: [0.7209214040404006,0.5115425958081843]
    11     7.045248e-02     5.833360e+00
 * Current step size: 1.0
 * g(x): [5.8333604145771085,-3.8059662190478303]
 * x: [0.8149620748050577,0.6451333522754301]
    12     3.648692e-02     3.479014e-01
 * Current step size: 1.0
 * g(x): [0.183002268816698,-0.34790135254194715]
 * x: [0.8097781980944143,0.6540012233463378]
    13     2.115711e-02     4.112740e-01
 * Current step size: 1.0
 * g(x): [0.41127403876484353,-0.40847715366200443]
 * x: [0.8559862046536438,0.7306699967890502]
    14     1.210687e-02     1.974007e+00
 * Current step size: 0.5266691291302079
 * g(x): [1.9740071820063751,-1.1896944844651578]
 * x: [0.9074341641922908,0.8174882899210666]
    15     5.326343e-03     1.224351e+00
 * Current step size: 1.0
 * g(x): [1.2243512512959578,-0.7213901087176319]
 * x: [0.9365544803675627,0.873527344152986]
    16     1.612733e-03     6.954378e-02
 * Current step size: 1.0
 * g(x): [0.05350690755545637,-0.06954378450301275]
 * x: [0.9599919719180943,0.9212368672246779]
    17     7.082179e-04     6.769619e-01
 * Current step size: 0.47481589703906557
 * g(x): [0.6769618891013601,-0.3649229837363147]
 * x: [0.9806274444118896,0.9598055698151217]
    18     9.069323e-05     2.096519e-01
 * Current step size: 1.0
 * g(x): [0.2096519120636138,-0.11334683308667262]
 * x: [0.9923466038802332,0.9841850480672021]
    19     4.942180e-06     3.852335e-02
 * Current step size: 1.0
 * g(x): [0.0385233450626055,-0.021255772871464587]
 * x: [0.9980473964382077,0.9959923266727282]
    20     7.254026e-08     9.880541e-03
 * Current step size: 1.0
 * g(x): [0.009880540776753701,-0.005036293392661403]
 * x: [0.9999044508761377,0.9997837294149472]
    21     1.134254e-08     3.909967e-03
 * Current step size: 1.0
 * g(x): [-0.003909967353315911,0.0019077088139518238]
 * x: [0.9999526274308662,0.9999147956499624]
    22     6.902223e-11     6.465515e-05
 * Current step size: 1.0
 * g(x): [6.465514535666687e-5,-2.409993049577603e-5]
 * x: [1.0000082201101734,1.0000163197882646]
    23     3.962968e-13     1.106611e-05
 * Current step size: 1.0
 * g(x): [1.1066111695510297e-5,-6.077060377116956e-6]
 * x: [0.9999994486651582,0.9999988669453186]
    24     1.714637e-16     4.604080e-07
 * Current step size: 1.0
 * g(x): [-4.604079914281444e-7,2.314065272725728e-7]
 * x: [0.9999999938688299,0.9999999888946924]
    25     4.926459e-17     2.413791e-08
 * Current step size: 1.0
 * g(x): [2.4137911625127053e-8,-1.1729572868579233e-8]
 * x: [0.9999999930056717,0.9999999859526955]
Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.046497946064945506,0.6103779534361964]
 * Minimum: [0.9999999930056717,0.9999999859526955]
 * Value of Function at Minimum: 0.000000
 * Iterations: 26
 * Convergence: false
   * |x - x'| < 1.0e-32: false
   * |f(x) - f(x')| / |f(x)| < 1.0e-08: false
   * |g(x)| < 1.0e-08: false
   * Exceeded Maximum Number of Iterations: false
 * Objective Function Calls: 154
 * Gradient Call: 154

Very quick take-aways:

It's possible to converge from your starting point with the other line search algorithm we provide, but don't use by default.
The HZ line search code fails in a tricky region of the input space where gradient moves larger than 0.1 times the gradient increase the function value very rapidly.
L-BFGS doesn't actually move in the direction of the gradient: it moves in something that approximates a Newton step instead. To really debug this, we'd need to log the actual search direction, not the gradient.
We assess the safety of using the current search direction by taking a dot product. We should check for errors there, although I suspect the problem in this case is that we need to potentially reset the approximate Hessian and start from the gradient again.

This week is going to be really bad for me, so I doubt I'll manage to solve this. But it's fun to think we could make meaningful progress on improving Optim.

johnmyleswhite commented 9 years ago

Possibly one might want to reset the search direction at this line https://github.com/JuliaOpt/Optim.jl/blob/8ad3d9b830a3a3246ea056a26366e471d89c51d1/src/l_bfgs.jl#L164

johnmyleswhite commented 9 years ago

This line in the BFGS code should provide a good example of how to do this: https://github.com/JuliaOpt/Optim.jl/blob/8ad3d9b830a3a3246ea056a26366e471d89c51d1/src/bfgs.jl#L112

johnmyleswhite commented 9 years ago

Here's a plan for moving forward:

Someone needs to implement a version of L-BFGS that will reset the whole state (as if starting the algorithm from scratch) when the search direction gets too close to being a direction of non-descent. The notes above will help.
We need to demonstrate that such a change is sufficient to make things work for this example.
We also need to demonstrate on a reasonable sample of problems that this chance doesn't make things worse for problems that converged with the current code.

One exciting possibility is that resetting the search direction in these case will speed up convergence in problems where we already do converge.

johnmyleswhite commented 9 years ago

I implemented a demonstration of how one might fix this bug. There approach I've taken is a rushed fix. We can't merge it until something more principled is put in place. But it should be sufficient for checking that this approach works in many cases.

pkofod commented 9 years ago

I guess this is a reminder to go over the tests once more. The rosenbrock test-problem is already in unconstrained.jl (and I am pretty sure the derivatives are correct, because I went through all of them at some point), so the L-BFGS tests should really pass this as well. With the standard choices of tolerances and(!) line search. Thanks for finding this @oxinabox

johnmyleswhite commented 9 years ago

I'm pretty sure the Rosenbrock tests have always passed. The problem with this example is path-dependent and requires a specific initial point because you need to start in a location such that the accumulated gradient history gives you an inferior search direction.

pkofod commented 9 years ago

Okay, I get it. Eventually, the Hessian approximation attains an unfortunate value in this example. This is hard to test for then, but still good it was found.

(It does seems as if L-BFGS is tested on a Rosenbrock-function, but a different parametrization than in unconstrained.jl, though.)

oxinabox commented 9 years ago

For reference, the way I found this was by looping over 1_000_000 random possible values for the t, running optimise and waiting til an exception was thrown. It happened almost immediately (<<1 second), so probably did not get through very many of those 1_000_000 trials.

Would it be worth adding some of these "fuzzing" style tests to the test suite?

johnmyleswhite commented 9 years ago

A subset of fuzz tests could be useful. At some point we need to think of a better strategy for doing randomized testing. What makes it tricky is that you need to know what the invariants are before you can error out when the code doesn't do something you expect on some inputs.

timholy commented 9 years ago

You probably deserve some sort of reward for this since it gives us a realistic chance to improve the hz_linesearch! code.

Agreed with the good detective work. However, as @johnmyleswhite noted in #144, the fix should be make in l_bfgs; if you specify method=:cg, you don't get an error. So it doesn't appear to be a problem with hz_linesearch! per se.

johnmyleswhite commented 9 years ago

Closed by #144

JuliaNLSolvers / Optim.jl

Search direction is not a direction of descent #143