Closed oxinabox closed 9 years ago
New better MWE.
Without having to have a batch learning senario.
Same error on the rosenbrock function, with and without gradient provided,
if it is started from [0.046497946064945506, 0.6103779534361964]
MWE: (also edited into http://nbviewer.ipython.org/gist/oxinabox/4f51dba39154fbb2edbd)
using Optim
function rosenbrock(x::Vector)
return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end
function rosenbrock_gradient!(x::Vector, storage::Vector)
storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
storage[2] = 200.0 * (x[2] - x[1]^2)
end
fr = rosenbrock
gr! = rosenbrock_gradient!
t=[0.046497946064945506,0.6103779534361964]
optimize(fr, t, method = :l_bfgs) #Errors out -- LoadError: Search direction is not a direction of descent;
optimize(fr, gr!, t, method = :l_bfgs) #Also Errors out -- LoadError: Search direction is not a direction of descent;
Using the minimal working example from oxinax above (the New better MWE) I get the following error from
optimize(fr, t, method = :l_bfgs)
ERROR: Search direction is not a direction of descent; this error may indicate that user-provided derivatives are inaccurate. (dphia = 0.000014; dphib = 0.000015) in error at error.jl:21 in secant2! at /Users/christian/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:143 in hz_linesearch! at /Users/christian/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:321 in hz_linesearch! at /Users/christian/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:176 (repeats 10 times) in l_bfgs at /Users/christian/.julia/v0.3/Optim/src/l_bfgs.jl:169 in optimize at /Users/christian/.julia/v0.3/Optim/src/optimize.jl:521 in include at /usr/local/Cellar/julia/0.3.11/lib/julia/sys.dylib in include_from_node1 at /usr/local/Cellar/julia/0.3.11/lib/julia/sys.dylib while loading /Users/christian/tmp/julia-mwe.jl, in expression starting on line 16
and
ERROR: Search direction is not a direction of descent; this error may indicate that user-provided derivatives are inaccurate. (dphia = 0.000014; dphib = 0.000015) in error at error.jl:21 in secant2! at /Users/christian/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:143 in hz_linesearch! at /Users/christian/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:321 in hz_linesearch! at /Users/christian/.julia/v0.3/Optim/src/linesearch/hz_linesearch.jl:176 (repeats 10 times) in l_bfgs at /Users/christian/.julia/v0.3/Optim/src/l_bfgs.jl:169 in optimize at /Users/christian/.julia/v0.3/Optim/src/optimize.jl:442 in include at /usr/local/Cellar/julia/0.3.11/lib/julia/sys.dylib in include_from_node1 at /usr/local/Cellar/julia/0.3.11/lib/julia/sys.dylib while loading /Users/christian/tmp/julia-mwe.jl, in expression starting on line 17
If I try the second command, optimize(fr, gr!, t, method = :l_bfgs)
.
I'm on Julia 0.3.11 on OSX.
Thanks for finding, to my knowledge, the very first reproducible case where the problem is in Optim. You probably deserve some sort of reward for this since it gives us a realistic chance to improve the hz_linesearch!
code.
Here's some more details from a little digging I did based on your test case:
julia> using Optim
julia> function rosenbrock(x::Vector)
return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end
rosenbrock (generic function with 1 method)
julia> function rosenbrock_gradient!(x::Vector, storage::Vector)
storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
storage[2] = 200.0 * (x[2] - x[1]^2)
end
rosenbrock_gradient! (generic function with 1 method)
julia> fr = rosenbrock
rosenbrock (generic function with 1 method)
julia> gr! = rosenbrock_gradient!
rosenbrock_gradient! (generic function with 1 method)
julia> t=[0.046497946064945506,0.6103779534361964]
2-element Array{Float64,1}:
0.0464979
0.610378
julia> optimize(fr, t, method = :l_bfgs, show_trace = true, extended_trace = true)
Iter Function value Gradient norm
0 3.790182e+01 1.216432e+02
* Current step size: 1.0
* g(x): [-13.21932004832231,121.64317888993415]
* x: [0.046497946064945506,0.6103779534361964]
1 3.034185e+01 1.085285e+02
* Current step size: 0.0005332665541292081
* g(x): [-13.515733902722369,108.52847881068213]
* x: [0.0535473673150455,0.5455097145962384]
2 6.585835e-01 2.019665e+01
* Current step size: 0.0373121785128398
* g(x): [-20.19665188770014,14.752143697001937]
* x: [0.6615932867760497,0.5114663955917655]
3 9.177974e-02 2.798717e+00
* Current step size: 0.00010847223404751265
* g(x): [2.798717471476102,-2.330776560657305]
* x: [0.7203601582898177,0.5072648748481026]
4 8.223303e-02 3.686765e-01
* Current step size: 0.9053217218744186
* g(x): [-0.04599511124587593,-0.3686764961556951]
* x: [0.7138300700625371,0.5077099864447171]
ERROR: Search direction is not a direction of descent; this error may indicate that user-provided derivatives are inaccurate. (dphia = 0.000014; dphib = 0.000015)
in error at ./error.jl:21
[inlined code] from printf.jl:145
in secant2! at /Users/johnmyleswhite/.julia/v0.4/Optim/src/linesearch/hz_linesearch.jl:145
in hz_linesearch! at /Users/johnmyleswhite/.julia/v0.4/Optim/src/linesearch/hz_linesearch.jl:321
in hz_linesearch! at /Users/johnmyleswhite/.julia/v0.4/Optim/src/linesearch/hz_linesearch.jl:176 (repeats 10 times)
in l_bfgs at /Users/johnmyleswhite/.julia/v0.4/Optim/src/l_bfgs.jl:169
in optimize at /Users/johnmyleswhite/.julia/v0.4/Optim/src/optimize.jl:508
julia> x = [0.7138300700625371,0.5077099864447171]
2-element Array{Float64,1}:
0.71383
0.50771
julia> g = [-0.04599511124587593,-0.3686764961556951]
2-element Array{Float64,1}:
-0.0459951
-0.368676
julia> rosenbrock(x)
0.0822330346974532
julia> rosenbrock(x - 0.1 * g)
0.16014576878251033
julia> rosenbrock(x - 0.01 * g)
0.08177097427262547
julia> optimize(fr, t, method = :l_bfgs, show_trace = true, extended_trace = true, linesearch! = Optim.mt_linesearch!)
Iter Function value Gradient norm
0 3.790182e+01 1.216432e+02
* Current step size: 1.0
* g(x): [-13.21932004832231,121.64317888993415]
* x: [0.046497946064945506,0.6103779534361964]
1 9.116925e-01 6.805628e+00
* Current step size: 0.004642385216210363
* g(x): [-3.25247285644434,6.805628418120389]
* x: [0.10786712202563026,0.0456634581047336]
2 6.087523e-01 1.999803e+00
* Current step size: 0.042883281352515644
* g(x): [-1.9998034007591687,0.9995559691498415]
* x: [0.22137649455174352,0.054005332185795674]
3 5.504648e-01 5.361268e+00
* Current step size: 0.12114673063410149
* g(x): [1.9209050791536908,-5.361267635652343]
* x: [0.3081858286124246,0.06817216677927206]
4 5.223893e-01 2.581103e+00
* Current step size: 1.0
* g(x): [0.06880402689466916,-2.581102670998708]
* x: [0.28885014096426803,0.07052889058011375]
5 4.916924e-01 1.817491e+00
* Current step size: 1.0
* g(x): [-0.2829891706267437,-1.817490899121915]
* x: [0.30470565591213417,0.08375808224923581]
6 3.393769e-01 4.927059e+00
* Current step size: 1.0
* g(x): [3.596231115305139,-4.927059012924018]
* x: [0.4720917318517486,0.19823530821817847]
7 2.733593e-01 1.759482e+00
* Current step size: 1.0
* g(x): [0.6745822033549768,-1.7594816487326084]
* x: [0.484616775308535,0.2260560106667956]
8 1.930228e-01 5.152367e+00
* Current step size: 1.0
* g(x): [5.152366982141452,-4.69021520328036]
* x: [0.6284794790135224,0.371535379524615]
9 1.363312e-01 5.020847e-01
* Current step size: 1.0
* g(x): [-0.10249496316464486,-0.502084665860697]
* x: [0.6316238593922736,0.3964382764242786]
10 8.458441e-02 1.802161e+00
* Current step size: 1.0
* g(x): [1.8021611218079159,-1.6370149990873493]
* x: [0.7209214040404006,0.5115425958081843]
11 7.045248e-02 5.833360e+00
* Current step size: 1.0
* g(x): [5.8333604145771085,-3.8059662190478303]
* x: [0.8149620748050577,0.6451333522754301]
12 3.648692e-02 3.479014e-01
* Current step size: 1.0
* g(x): [0.183002268816698,-0.34790135254194715]
* x: [0.8097781980944143,0.6540012233463378]
13 2.115711e-02 4.112740e-01
* Current step size: 1.0
* g(x): [0.41127403876484353,-0.40847715366200443]
* x: [0.8559862046536438,0.7306699967890502]
14 1.210687e-02 1.974007e+00
* Current step size: 0.5266691291302079
* g(x): [1.9740071820063751,-1.1896944844651578]
* x: [0.9074341641922908,0.8174882899210666]
15 5.326343e-03 1.224351e+00
* Current step size: 1.0
* g(x): [1.2243512512959578,-0.7213901087176319]
* x: [0.9365544803675627,0.873527344152986]
16 1.612733e-03 6.954378e-02
* Current step size: 1.0
* g(x): [0.05350690755545637,-0.06954378450301275]
* x: [0.9599919719180943,0.9212368672246779]
17 7.082179e-04 6.769619e-01
* Current step size: 0.47481589703906557
* g(x): [0.6769618891013601,-0.3649229837363147]
* x: [0.9806274444118896,0.9598055698151217]
18 9.069323e-05 2.096519e-01
* Current step size: 1.0
* g(x): [0.2096519120636138,-0.11334683308667262]
* x: [0.9923466038802332,0.9841850480672021]
19 4.942180e-06 3.852335e-02
* Current step size: 1.0
* g(x): [0.0385233450626055,-0.021255772871464587]
* x: [0.9980473964382077,0.9959923266727282]
20 7.254026e-08 9.880541e-03
* Current step size: 1.0
* g(x): [0.009880540776753701,-0.005036293392661403]
* x: [0.9999044508761377,0.9997837294149472]
21 1.134254e-08 3.909967e-03
* Current step size: 1.0
* g(x): [-0.003909967353315911,0.0019077088139518238]
* x: [0.9999526274308662,0.9999147956499624]
22 6.902223e-11 6.465515e-05
* Current step size: 1.0
* g(x): [6.465514535666687e-5,-2.409993049577603e-5]
* x: [1.0000082201101734,1.0000163197882646]
23 3.962968e-13 1.106611e-05
* Current step size: 1.0
* g(x): [1.1066111695510297e-5,-6.077060377116956e-6]
* x: [0.9999994486651582,0.9999988669453186]
24 1.714637e-16 4.604080e-07
* Current step size: 1.0
* g(x): [-4.604079914281444e-7,2.314065272725728e-7]
* x: [0.9999999938688299,0.9999999888946924]
25 4.926459e-17 2.413791e-08
* Current step size: 1.0
* g(x): [2.4137911625127053e-8,-1.1729572868579233e-8]
* x: [0.9999999930056717,0.9999999859526955]
Results of Optimization Algorithm
* Algorithm: L-BFGS
* Starting Point: [0.046497946064945506,0.6103779534361964]
* Minimum: [0.9999999930056717,0.9999999859526955]
* Value of Function at Minimum: 0.000000
* Iterations: 26
* Convergence: false
* |x - x'| < 1.0e-32: false
* |f(x) - f(x')| / |f(x)| < 1.0e-08: false
* |g(x)| < 1.0e-08: false
* Exceeded Maximum Number of Iterations: false
* Objective Function Calls: 154
* Gradient Call: 154
Very quick take-aways:
0.1
times the gradient increase the function value very rapidly.This week is going to be really bad for me, so I doubt I'll manage to solve this. But it's fun to think we could make meaningful progress on improving Optim.
Possibly one might want to reset the search direction at this line https://github.com/JuliaOpt/Optim.jl/blob/8ad3d9b830a3a3246ea056a26366e471d89c51d1/src/l_bfgs.jl#L164
This line in the BFGS code should provide a good example of how to do this: https://github.com/JuliaOpt/Optim.jl/blob/8ad3d9b830a3a3246ea056a26366e471d89c51d1/src/bfgs.jl#L112
Here's a plan for moving forward:
One exciting possibility is that resetting the search direction in these case will speed up convergence in problems where we already do converge.
I implemented a demonstration of how one might fix this bug. There approach I've taken is a rushed fix. We can't merge it until something more principled is put in place. But it should be sufficient for checking that this approach works in many cases.
I guess this is a reminder to go over the tests once more. The rosenbrock test-problem is already in unconstrained.jl (and I am pretty sure the derivatives are correct, because I went through all of them at some point), so the L-BFGS tests should really pass this as well. With the standard choices of tolerances and(!) line search. Thanks for finding this @oxinabox
I'm pretty sure the Rosenbrock tests have always passed. The problem with this example is path-dependent and requires a specific initial point because you need to start in a location such that the accumulated gradient history gives you an inferior search direction.
Okay, I get it. Eventually, the Hessian approximation attains an unfortunate value in this example. This is hard to test for then, but still good it was found.
(It does seems as if L-BFGS is tested on a Rosenbrock-function, but a different parametrization than in unconstrained.jl, though.)
For reference, the way I found this was by looping over 1_000_000
random possible values for the t
, running optimise and waiting til an exception was thrown.
It happened almost immediately (<<1 second), so probably did not get through very many of those 1_000_000
trials.
Would it be worth adding some of these "fuzzing" style tests to the test suite?
A subset of fuzz tests could be useful. At some point we need to think of a better strategy for doing randomized testing. What makes it tricky is that you need to know what the invariants are before you can error out when the code doesn't do something you expect on some inputs.
You probably deserve some sort of reward for this since it gives us a realistic chance to improve the hz_linesearch! code.
Agreed with the good detective work. However, as @johnmyleswhite noted in #144, the fix should be make in l_bfgs
; if you specify method=:cg
, you don't get an error. So it doesn't appear to be a problem with hz_linesearch!
per se.
Closed by #144
I am getting an error that the search direction is not a direction of descent, it suggests that "this error may indicate that user-provided derivatives are inaccurate." however that can not be the case, as I did not provide any. (I was lazy)
Error is:
Full MWE on: http://nbviewer.ipython.org/gist/oxinabox/4f51dba39154fbb2edbd
Error is in Cell 7. Executing this requires the R-Datasets package
I can't run this in Julia 0.4 or 0.3 at the moment as both are not compiling for me.
I suspect this might be linked to #121 as both are to do with batch-learning over constant training data. However, in this case it is (more) certain that it can't be user error in the definition of the derivative since it is being derived automatically.