Rethinking x/optym's interface

brandondube commented 8 months ago

The "core" of x/optym is the convention def thing(fg: callable), where fg returns (cost, grad) based on the parameter vector x.

This is in a way restrictive, since gradient-less optimizers will just do f, _ = fg(x), and the computation of g will have been wasteful. There are also some circumstances where a linesearcher or similar may want only the gradient; in these scenarios the computation of f will have been wasteful. Of course, when using backprop, f is free along the way to computing g, but sometimes the gradient is known or compute-able without f (for example the rosenbrock function).

It is a greater burden on the user, but it may be superior to change fg to something like optimizeable, which is of the sense

type Optimizable interface {
    f(vector) float
    g(vector) vector
    h(vector) array

[optional]
    fg(vector) (float, vector)
    fgh(vector) (float, vector, array)
}

Then each optimizer can just check if not hasattr(o, 'g'): raise ValueError('<myoptimizer> requires the gradient'). In principle we could fall back to finite differences, but I think that just leads to unhappy or misunderstanding users who do finite differences for problems with ~a dozen dimensions, then view it as impossible for something like a million dimensions when it would have been perfectly doable with backprop. Forcing the user to opt in with a forward_differences(f, x0, eps=1e-9) and central_differences(f, x0, eps=1e-9) set of functions could help abate this

i.e., one might do

from scipy.optimize import rosen, rosen_der

class Rosenbrock:
    def f(self, x):
        return rosen(x)
    def g(self, x):
        # return rosen_der(x)
        return forward_differences(self.f, x)

I think this would be preferable to enable something like Nelder-Meade for functions that for example do not strictly have a gradient. In principle we could also look for h_j_prod(vector) vector but I sincerely hope I never implement optimizers that want the hessian jacobian product

Thoughts @Jashcraf ?

Jashcraf commented 8 months ago

Personally I don't think it's a big deal to throw a ValueError for an optimizer that requires a gradient.

Something I don't really understand - why would you want the gradient for an optimizer that doesn't require one (e.g. Nelder-Mead)?

brandondube commented 8 months ago

Something I don't really understand - why would you want the gradient for an optimizer that doesn't require one (e.g. Nelder-Mead)?

The intent is actually to modify the interface so that the gradient is optional in the most general sense, but a gradient-based optimizer would error if it's not available.

brandondube / prysm

Rethinking x/optym's interface #109