Multivariate box minimization without gradient

matbesancon commented 6 years ago

From the doc, the two cases for box minimization is either:

univariate function
OnceDifferentiable wrapper with provided gradient

Maybe I just missed the feature but it would be awesome to have the possibility of box optimization on multi-dimensional functions without an explicit gradient. Thanks!

pkofod commented 6 years ago

Is it not differentiable or do you just not have the gradient at hand?

Edit: Note that behind the scenes constrained optimization (not just box, but also box) is being worked on atm, though we're a bit constrained developer time-wise.

matbesancon commented 6 years ago

I don't have an explicit gradient, but it is differentiable.

pkofod commented 6 years ago

Alright, then it shouldn't be problem. Let me find an example.

pkofod commented 6 years ago

julia> using Optim

julia> function exponential(x)
           return exp((2.0 - x[1])^2) + exp((3.0 - x[2])^2)
       end
exponential (generic function with 1 method)

julia> function exponential_gradient!(storage, x)
           storage[1] = -2.0 * (2.0 - x[1]) * exp((2.0 - x[1])^2)
           storage[2] = -2.0 * (3.0 - x[2]) * exp((3.0 - x[2])^2)
       end
exponential_gradient! (generic function with 1 method)

julia> initial_x = [0.0, 0.0]
2-element Array{Float64,1}:
 0.0
 0.0

julia> optimize(exponential, exponential_gradient!, initial_x, BFGS())
Results of Optimization Algorithm
 * Algorithm: BFGS
 * Starting Point: [0.0,0.0]
 * Minimizer: [1.9999999999999587,2.999999999999897]
 * Minimum: 2.000000e+00
 * Iterations: 12
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 1.04e-05 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: false
     |f(x) - f(x')| = 6.19e-11 |f(x)|
   * |g(x)| ≤ 1.0e-08: true 
     |g(x)| = 2.06e-13 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 49
 * Gradient Calls: 49

julia> lb = fill(-0.1, 2)
2-element Array{Float64,1}:
 -0.1
 -0.1

julia> ub = fill(1.1, 2)
2-element Array{Float64,1}:
 1.1
 1.1

julia> od = OnceDifferentiable(exponential, initial_x)
NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1},Val{false}}(exponential, Optim.g!, Optim.fg!, 0.0, [6.93938e-310, 6.9394e-310], [NaN, NaN], [NaN, NaN], [0], [0])

julia> optimize(od, initial_x, lb, ub, Fminbox())
Results of Optimization Algorithm
 * Algorithm: Fminbox with Conjugate Gradient
 * Starting Point: [0.0,0.0]
 * Minimizer: [1.099999999999446,1.099999999999984]
 * Minimum: 3.921396e+01
 * Iterations: 5
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 5.53e-10 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: true
     |f(x) - f(x')| = 1.13e-10 |f(x)|
   * |g(x)| ≤ 1.0e-08: false 
     |g(x)| = 1.40e+02 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 283
 * Gradient Calls: 213

which uses finite differences or if you want to use ForwardDiff

julia> od = OnceDifferentiable(exponential, initial_x; autodiff = :forward)
NLSolversBase.OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1},Val{false}}(exponential, Optim.#3, Optim.#4, 0.0, [8.32155e-317, 0.0], [NaN, NaN], [NaN, NaN], [0], [0])

julia> optimize(od, initial_x, lb, ub, Fminbox())
Results of Optimization Algorithm
 * Algorithm: Fminbox with Conjugate Gradient
 * Starting Point: [0.0,0.0]
 * Minimizer: [1.0999999999994408,1.0999999999999839]
 * Minimum: 3.921396e+01
 * Iterations: 5
 * Convergence: true
   * |x - x'| ≤ 1.0e-32: false 
     |x - x'| = 5.53e-10 
   * |f(x) - f(x')| ≤ 1.0e-32 |f(x)|: true
     |f(x) - f(x')| = 1.13e-10 |f(x)|
   * |g(x)| ≤ 1.0e-08: false 
     |g(x)| = 1.40e+02 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 279
 * Gradient Calls: 208

However, let's transform this into a bug-report, as it is NOT intended that you cannot just pass the functions . The following should just have worked, but there is a construction of the OnceDifferentiable that is apparently not updated to the new format

julia> optimize(exponential, initial_x, lb, ub, Fminbox())
ERROR: Optimizing an objective `obj` without providing an initial `x` has been deprecated without backwards compatability. Please explicitly provide an `x`: `optimize(obj, x, l, u, method, options)``
Stacktrace:
 [1] #optimize#110(::Array{Any,1}, ::Function, ::Function, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}, ::Optim.Fminbox{Optim.ConjugateGradient}) at /home/pkm/.julia/v0.6/Optim/src/multivariate/solvers/constrained/fminbox.jl:106
 [2] optimize(::Function, ::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}, ::Optim.Fminbox{Optim.ConjugateGradient}) at /home/pkm/.julia/v0.6/Optim/src/multivariate/solvers/constrained/fminbox.jl:106

matbesancon commented 6 years ago

Thanks a lot! Indeed I assumed that OnceDifferentiable required the gradient to construct.

pkofod commented 6 years ago

Yeah, the docs are in need of being rewritten

timholy commented 6 years ago

Also for non-differentiable functions check out the brand-new https://github.com/timholy/QuadDIRECT.jl. Started life as https://github.com/timholy/MultilevelCoordinateSearch.jl but I found MCS to be tough to implement, and in any event as I progressed I began to wonder if one might be able to do even better. I don't yet know whether QuadDIRECT is better, though---still needs lots of benchmarks.

ChrisRackauckas commented 6 years ago

Well, non-differentiable global optimizers are exactly what's needed for diffeq parameter estimation. @finmod would you like to give these a try in the Lorenz benchmark?

matbesancon commented 6 years ago

@ChrisRackauckas is non-differentiable the base case you would consider because of the integration of jumps in the diffeq models? In the case of ODEs, isn't a loss function for parameter estimation differentiable?

ChrisRackauckas commented 6 years ago

is non-differentiable the base case you would consider because of the integration of jumps in the diffeq models?

No. For those kinds of models you wouldn't optimize on a single solution, but on a Monte Carlo set of solutions, optimizing things like averages and distributions. For well-behaved jump equations those properties will be well-behaved as well.

In the case of ODEs, isn't a loss function for parameter estimation differentiable?

Yes, it's usually differentiable but the issue is that those derivatives require derivatives of the solution of the ODE with respect to parameters, which is expensive. The ways to do it are to solve the ODE N times for numerical/autodiff, or to use adjoint sensitivity analysis:

http://docs.juliadiffeq.org/latest/analysis/sensitivity.html#Adjoint-Sensitivity-Analysis-1

Global optimizers that utilize derivatives don't usually get all that much help from it since the derivatives are very local information, and since the derivatives are expensive that means that it can be a better idea to just use a derivative free method.

JuliaNLSolvers / Optim.jl

Multivariate box minimization without gradient #522