RFC: Revise Unconstrained and Box Constrained Optimization API

johnmyleswhite commented 11 years ago

We currently have a mixed API for optimization: the functions I wrote work very differently from the functions that Tim wrote. To unify things, I'd like to propose a new API that I hope we can all standardize on. The proposal is quite long, but I think it touches on all of the issues we need to confront. I'm opening it as an issue because I expect we'll want to debate the design for a while before implementing anything.

To simplify the discussion, let's introduce some notation.

My API worked exclusively with pure functions, which I'll refer to as:

f denotes a function from R^n to R. The result is returned as a Real of some sort.
g denotes the gradient of f, which makes g a function from R^n to R^n. The result is returned as a Vector{T} for some Real type T.
h denotes the Hessian of f, which makes h a function from R^n to R^n*m. The result is returned as a Matrix{T} for some Real type T.

Tim's API employed mutating functions, which I'll refer to as:

f is the same as above.
g! denotes the gradient of f, but one which mutates an input argument so that it is called as g!(storage, x). I'd like to transition this over to g!(x, storage). As will be seen, I'd also like to remove the nothing arguments being used in the current implementation. Because function is impure, nothing is returned.
h! denotes the Hessian of f, but one which mutates an input argument so that it is called as h!(storage, x). Again, I'd like to transition this over to h!(x, storage). Because function is impure, nothing is returned.
fg! denotes a coupled pair of function and gradient that get evaluated simultaneously for efficiency. This coupled pair is called as fg!(x, storage) and returns the value of f evaluated at x after mutating storage.

One could also consider functions like gh! and fgh!, but I'm not currently aware of a proposed use for those things.

Using this notation, my proposed new API is the following:

We should entirely remove the use of g and h and enforce the use of mutating functions g! and h!. This may confuse some users, but I think the gains are worth the pain.
We should allow automatic creation of methods for g! and h! using finite differencing. I've already added the ability to do finite-differencing by mutating an array to the Calculus package in preparation for this.
We should construct ad hoc immutable types that can wrap up functions like f, g! and fg! into a single unit that permits multiple dispatch. These differentiable functions will become the core backend construct of the Optim package. End-users will not need to provide them, because we will generate these values for users automatically. But users who want to exploit forms like fg! can use these types, which will prevent the automatic creation of wrappers.

Specifically, I propose creating the following types and methods:

immutable OnceDifferentiableFunction
    f::Function
    g!::Function
end

immutable CoupledOnceDifferentiableFunction
    f::Function
    g!::Function
    fg!::Function
end

immutable TwiceDifferentiableFunction
    f::Function
    g!::Function
    h!::Function
end

Using these functions, we could create methods like the following:

For pure function calls:

callf(Function, x)
callf(OnceDifferentiableFunction, x)
callf(CoupledOnceDifferentiableFunction, x)
callf(TwiceDifferentiableFunction, x)

For mutating gradient function calls:

callg!(OnceDifferentiableFunction, x, storage)
callg!(CoupledOnceDifferentiableFunction, x, storage)
callg!(TwiceDifferentiableFunction, x, storage)

For mutating Hessian function calls:

callh!(TwiceDifferentiableFunction, x, storage)

For simultaneous function and mutating gradient function calls:

callfg!(OnceDifferentiableFunction, x, storage)
callfg!(CoupledOnceDifferentiableFunction, x, storage)
callfg!(TwiceDifferentiableFunction, x, storage)

Using these types and functions, we should be able to express all of the computations we're doing now while doing much less memory allocation. Also, the use of multiple dispatch should make it easier to do redirection by automatically creating gradients when needed.

timholy commented 11 years ago

I think this looks promising! With the caveat that implementation often reveals things that are hard to predict in the abstract, I'd say this looks like a very nice solution to our API problem.

johnmyleswhite commented 11 years ago

I'll start to demo out a revision of the package based on this approach. I did a bit this morning and found that this proposal is kind of overkill: it's probably not worth the effort to (a) add the call* methods or (b) allow the uncoupled types because we can always synthesize "low" performance coupled functions from any reasonable inputs. I think I'm just going to work with CoupledDifferntiableFunction, since that provides the most flexibility for performance conscious programmers to optimize the naive implementation we'll synthesize based on inputs.

johnmyleswhite commented 11 years ago

Following up on the theme of simplification, my current thinking is that we'll use the following two types as the core inputs to every optimization function:

immutable DifferentiableFunction
    f::Function
    g!::Function
    fg!::Function
end

immutable TwiceDifferentiableFunction
    f::Function
    g!::Function
    fg!::Function
    h!::Function
end

The latter may need to be supplemented with higher-order coupled functions, but I'm assuming that we can avoid that for the time being.

We'll convert existing functions into these types using something like the following:

using Calculus

function DifferentiableFunction(f::Function)
    function g!(x::Vector, storage::Vector)
        Calculus.finite_difference!(f, x, storage, :central)
        return
    end
    function fg!(x::Vector, storage::Vector)
        g!(x, storage)
        return f(x)
    end
    return DifferentiableFunction(f, g!, fg!)
end

function DifferentiableFunction(f::Function, g!::Function)
    function fg!(x::Vector, storage::Vector)
        g!(x, storage)
        return f(x)
    end
    return DifferentiableFunction(f, g!, fg!)
end

function TwiceDifferentiableFunction(f::Function)
    function g!(x::Vector, storage::Vector)
        Calculus.finite_difference!(f, x, storage, :central)
        return
    end
    function fg!(x::Vector, storage::Vector)
        g!(x, storage)
        return f(x)
    end
    function h!(x::Vector, storage::Matrix)
        Calculus.finite_difference_hessian!(f, x, storage)
        return
    end
    return TwiceDifferentiableFunction(f, g!, fg!, h!)
end

function TwiceDifferentiableFunction(f::Function, g!::Function, h!::Function)
    function fg!(x::Vector, storage::Vector)
        g!(x, storage)
        return f(x)
    end
    return TwiceDifferentiableFunction(f, g!, fg!, h!)
end

This means that naive users can continue using these functions without much work, but advanced users can write high-performance code that gets passed in directly as DifferentiableFunction or TwiceDifferentiableFunction types.

johnmyleswhite commented 11 years ago

Closing this in favor of the pull request, which I will merge very soon.

JuliaNLSolvers / Optim.jl

RFC: Revise Unconstrained and Box Constrained Optimization API #16