Closed johnmyleswhite closed 11 years ago
I think this looks promising! With the caveat that implementation often reveals things that are hard to predict in the abstract, I'd say this looks like a very nice solution to our API problem.
I'll start to demo out a revision of the package based on this approach. I did a bit this morning and found that this proposal is kind of overkill: it's probably not worth the effort to (a) add the call*
methods or (b) allow the uncoupled types because we can always synthesize "low" performance coupled functions from any reasonable inputs. I think I'm just going to work with CoupledDifferntiableFunction
, since that provides the most flexibility for performance conscious programmers to optimize the naive implementation we'll synthesize based on inputs.
Following up on the theme of simplification, my current thinking is that we'll use the following two types as the core inputs to every optimization function:
immutable DifferentiableFunction
f::Function
g!::Function
fg!::Function
end
immutable TwiceDifferentiableFunction
f::Function
g!::Function
fg!::Function
h!::Function
end
The latter may need to be supplemented with higher-order coupled functions, but I'm assuming that we can avoid that for the time being.
We'll convert existing functions into these types using something like the following:
using Calculus
function DifferentiableFunction(f::Function)
function g!(x::Vector, storage::Vector)
Calculus.finite_difference!(f, x, storage, :central)
return
end
function fg!(x::Vector, storage::Vector)
g!(x, storage)
return f(x)
end
return DifferentiableFunction(f, g!, fg!)
end
function DifferentiableFunction(f::Function, g!::Function)
function fg!(x::Vector, storage::Vector)
g!(x, storage)
return f(x)
end
return DifferentiableFunction(f, g!, fg!)
end
function TwiceDifferentiableFunction(f::Function)
function g!(x::Vector, storage::Vector)
Calculus.finite_difference!(f, x, storage, :central)
return
end
function fg!(x::Vector, storage::Vector)
g!(x, storage)
return f(x)
end
function h!(x::Vector, storage::Matrix)
Calculus.finite_difference_hessian!(f, x, storage)
return
end
return TwiceDifferentiableFunction(f, g!, fg!, h!)
end
function TwiceDifferentiableFunction(f::Function, g!::Function, h!::Function)
function fg!(x::Vector, storage::Vector)
g!(x, storage)
return f(x)
end
return TwiceDifferentiableFunction(f, g!, fg!, h!)
end
This means that naive users can continue using these functions without much work, but advanced users can write high-performance code that gets passed in directly as DifferentiableFunction
or TwiceDifferentiableFunction
types.
Closing this in favor of the pull request, which I will merge very soon.
We currently have a mixed API for optimization: the functions I wrote work very differently from the functions that Tim wrote. To unify things, I'd like to propose a new API that I hope we can all standardize on. The proposal is quite long, but I think it touches on all of the issues we need to confront. I'm opening it as an issue because I expect we'll want to debate the design for a while before implementing anything.
To simplify the discussion, let's introduce some notation.
My API worked exclusively with pure functions, which I'll refer to as:
f
denotes a function from R^n to R. The result is returned as aReal
of some sort.g
denotes the gradient off
, which makesg
a function from R^n to R^n. The result is returned as aVector{T}
for someReal
typeT
.h
denotes the Hessian off
, which makesh
a function from R^n to R^n*m. The result is returned as aMatrix{T}
for someReal
typeT
.Tim's API employed mutating functions, which I'll refer to as:
f
is the same as above.g!
denotes the gradient off
, but one which mutates an input argument so that it is called asg!(storage, x)
. I'd like to transition this over tog!(x, storage)
. As will be seen, I'd also like to remove thenothing
arguments being used in the current implementation. Because function is impure, nothing is returned.h!
denotes the Hessian off
, but one which mutates an input argument so that it is called ash!(storage, x)
. Again, I'd like to transition this over toh!(x, storage)
. Because function is impure, nothing is returned.fg!
denotes a coupled pair of function and gradient that get evaluated simultaneously for efficiency. This coupled pair is called asfg!(x, storage)
and returns the value off
evaluated atx
after mutatingstorage
.One could also consider functions like
gh!
andfgh!
, but I'm not currently aware of a proposed use for those things.Using this notation, my proposed new API is the following:
g
andh
and enforce the use of mutating functionsg!
andh!
. This may confuse some users, but I think the gains are worth the pain.g!
andh!
using finite differencing. I've already added the ability to do finite-differencing by mutating an array to the Calculus package in preparation for this.f
,g!
andfg!
into a single unit that permits multiple dispatch. These differentiable functions will become the core backend construct of the Optim package. End-users will not need to provide them, because we will generate these values for users automatically. But users who want to exploit forms likefg!
can use these types, which will prevent the automatic creation of wrappers.Specifically, I propose creating the following types and methods:
Using these functions, we could create methods like the following:
For pure function calls:
callf(Function, x)
callf(OnceDifferentiableFunction, x)
callf(CoupledOnceDifferentiableFunction, x)
callf(TwiceDifferentiableFunction, x)
For mutating gradient function calls:
callg!(OnceDifferentiableFunction, x, storage)
callg!(CoupledOnceDifferentiableFunction, x, storage)
callg!(TwiceDifferentiableFunction, x, storage)
For mutating Hessian function calls:
callh!(TwiceDifferentiableFunction, x, storage)
For simultaneous function and mutating gradient function calls:
callfg!(OnceDifferentiableFunction, x, storage)
callfg!(CoupledOnceDifferentiableFunction, x, storage)
callfg!(TwiceDifferentiableFunction, x, storage)
Using these types and functions, we should be able to express all of the computations we're doing now while doing much less memory allocation. Also, the use of multiple dispatch should make it easier to do redirection by automatically creating gradients when needed.