Open echatav opened 9 years ago
Do you have any reason to think that it isn't converging to the minimum? What you show is that in 10000 iterations it hasn't made much progress. But it does seem to be going in the right direction, albeit very slowly!
Seriously, this gradientDescent
code was not intended for serious use, it was written as a placeholder and is completely untuned, designed to be robust but potentially very slow. The conjugateGradientDescent
routine looks to be in better shape (although its line search does 20 Newton steps instead of the more appropriate single Newton step, and there is no code to check for the objective changing in the right direction or trust regions or anything like that) and barring bugs should converge in just a handful of line searches for a two-dimensional quadratic objective function like this.
Seriously, this
gradientDescent
code was not intended for serious use
I didn't realize that :-/ I've been trying to make serious use of it.
@barak : Do you know what the best way to calculate argmin
s of convex objective functions in Haskell is?
Sorry. I blame ... not sure but I'll think of someone.
As to a better routine, the present conjugate gradient routine looks reasonable for a high dimensional system, although the line search could use a bit of robustness-enhancing love. And it works well here, moves rapidly to a point close to the optimum:
Prelude Numeric.AD> take 4 $ conjugateGradientDescent f [0,0]
[[0.0,0.0],[4.0285774585177165,4.7048094968615557e-4],[2.910697209479748,7196.113511627576],[2.911504913045699,7196.645421944485]]
For something low dimensional like the test problem you give, a simple Newton's method would also make sense. Or for robustness one with trust regions, or for something of intermediate dimensionality, BFGS or Levenberg-Marquardt. There is a curated collection of optimization routines in http://lmfit.github.io/lmfit-py/ and I think it would make a lot of sense to translate them from Python to Haskell, using the ad
library for the necessary gradients of course.
I've found this library optimization which seems designed to fit with ad
and linear
.
On this topic, what properties hold for conjugate descent? I'm noticing the progression is actually ascending, not descending. This is for a parabolic function with a log barrier so I'm guessing there is an issue with infinity.
:{
let obj2 [x] = x^2 +
let bar = (-1/1000) * log (- (5-x) )
in if isNaN bar then 1/0 else bar
-- obj2 is asymptotically approaching x=5
-- obj2 = inf when x <= 5
:}
conjugateGradientDescent obj2 [20]
~~> [[20.0],[0.0]]
Notice the lack of descent:
map obj2 it
[399.9972919497989,Infinity]
As a matter of update / hijacking of this issue. The above objective function, obj2
, retains no information about the gradient past the asymptote. Changing the numerator of 1/0
to what is commonly named µ - (5-x)/0
in this case - yields a differentiable value (via Kahn.grad
) but in general the objective function must be twice differentiable for conjugateGradient.
Perhaps I should send a pull request indicating
Thoughts?
Consider the function,
It's a smooth convex function and gradient descent should converge to
[m,b] = [2.911525576,7196.512447]
. See Wolfram Alpha.Instead,
[6.503543385615585,1.0522541316691529e-3]
[4.074543700217947,1.0251230667415617e-3]
[4.028579379271611,4.621971888645496e-3]
[4.02857388945333,4.100383031651195e-2]