Open brandondube opened 8 months ago
Personally I don't think it's a big deal to throw a ValueError
for an optimizer that requires a gradient.
Something I don't really understand - why would you want the gradient for an optimizer that doesn't require one (e.g. Nelder-Mead)?
Something I don't really understand - why would you want the gradient for an optimizer that doesn't require one (e.g. Nelder-Mead)?
The intent is actually to modify the interface so that the gradient is optional in the most general sense, but a gradient-based optimizer would error if it's not available.
The "core" of
x/optym
is the conventiondef thing(fg: callable)
, wherefg
returns(cost, grad)
based on the parameter vectorx
.This is in a way restrictive, since gradient-less optimizers will just do
f, _ = fg(x)
, and the computation of g will have been wasteful. There are also some circumstances where a linesearcher or similar may want only the gradient; in these scenarios the computation off
will have been wasteful. Of course, when using backprop,f
is free along the way to computingg
, but sometimes the gradient is known or compute-able withoutf
(for example the rosenbrock function).It is a greater burden on the user, but it may be superior to change
fg
to something likeoptimizeable
, which is of the senseThen each optimizer can just check
if not hasattr(o, 'g'): raise ValueError('<myoptimizer> requires the gradient')
. In principle we could fall back to finite differences, but I think that just leads to unhappy or misunderstanding users who do finite differences for problems with ~a dozen dimensions, then view it as impossible for something like a million dimensions when it would have been perfectly doable with backprop. Forcing the user to opt in with aforward_differences(f, x0, eps=1e-9)
andcentral_differences(f, x0, eps=1e-9)
set of functions could help abate thisi.e., one might do
I think this would be preferable to enable something like Nelder-Meade for functions that for example do not strictly have a gradient. In principle we could also look for
h_j_prod(vector) vector
but I sincerely hope I never implement optimizers that want the hessian jacobian productThoughts @Jashcraf ?