[FEATURE REQUEST] LBFGSB via LBFGSB.jl

rht commented 3 years ago

There is already a functioning Julia wrapper for LBFGSB: https://github.com/Gnimuc/LBFGSB.jl, which is the basis of the lbfgsb.f used in scipy.optimize. Note that scipy.optimize uses a modified version of LBFGSB based on this 2011 paper:

c         Jorge Nocedal and Jose Luis Morales, Remark on "Algorithm 778: 
c         L-BFGS-B: Fortran Subroutines for Large-Scale Bound Constrained 
c         Optimization"  (2011). To appear in  ACM Transactions on 
c         Mathematical Software

See #923.

It would be great to have an Optim.jl interface for LBFGSB.jl.

rht commented 3 years ago

I think I will do this myself since @pkofod is busy.

pkofod commented 3 years ago

I think I will do this myself since @pkofod is busy.

Are you thinking about a package like discussed in the other issue?

rht commented 3 years ago

Yes, I will do it as OptimLBFGSB.jl.

pkofod commented 3 years ago

I can start a repo here that you can PR against, or we can transfer it after?

rht commented 3 years ago

We can transfer it after.

goerz commented 2 years ago

I'd love to see this for use in GRAPE (our original GRAPE Fortran code is using L-BFGS-B as the backend).

Conceptually, though, it looks like the LBFGS method currently in Optim can do constraints (although I haven't tried out that feature yet). Would LBFGSB be expected to behave very differently when enforcing the bounds?

sethaxen commented 6 months ago

@rht not certain if you made an OptimLBFGSB.jl, but I implemented one here: https://github.com/sethaxen/OptimLBFGSB.jl.

rht commented 6 months ago

I ended up using https://github.com/Gnimuc/LBFGSB.jl/issues/11, because I wanted the result to be identical to SciPy's L-BFGSB.

pkofod commented 6 months ago

The problem with "wrapping" it in Optim IMO would be that we can accommodate a similar interface on the calling side of things, but it's harder to wrap the results to be accessed in the same way. But we could try with an extension?

sethaxen commented 6 months ago

harder to wrap the results to be accessed in the same way.

What do you mean by this?

pkofod commented 6 months ago

harder to wrap the results to be accessed in the same way.

What do you mean by this?

The output of Optim has a specific format. I'm not sure that it is possible to construct the complete "results" type from what we get from LBFGSB and the trace printing and storage will also be different. But we could try.

sethaxen commented 6 months ago

The output of Optim has a specific format. I'm not sure that it is possible to construct the complete "results" type from what we get from LBFGSB and the trace printing and storage will also be different. But we could try.

Ah, I see what you mean. For OptimLBFGSB.jl I didn't wrap LBFGSB.jl. Instead I wrapped L_BFGS_B_jll. L-BFGS-B offers an iterative API that LBFGSB.jl only uses in an internal function, so we can't use it. The iterative API fits Optim.jl well, so long as we use Optim's own convergence checks (L-BFGS-B has different ones). L-BFGS-B is just responsible then for the line search, two-loop recursion, and constraint handling.

Here's an example:

julia> using Optim, OptimLBFGSB

julia> f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2;

julia> x0 = [0.0, 0.0];

julia> options = Optim.Options(; store_trace=true, extended_trace=true);

julia> sol = optimize(f, x0, LBFGSB(), options)
 * Status: success

 * Candidate solution
    Final objective value:     4.089857e-17

 * Found with
    Algorithm:     LBFGSB{Nothing, Nothing}

 * Convergence measures
    |x - x'|               = 0.00e+00 ≤ 0.0e+00
    |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
    |g(x)|                 = 2.35e-08 ≰ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    23
    f(x) calls:    32
    ∇f(x) calls:   32

julia> Optim.x_trace(sol)
24-element Vector{Vector{Float64}}:
 [0.0, 0.0]
 [0.17485520507589397, 0.0]
 [0.2611363404175081, 0.04906635389155636]
 [0.3239441412158067, 0.11056436224568814]
 [0.4240470960987517, 0.16523903911939064]
 ⋮
 [0.9999850618932064, 0.9999669398908719]
 [0.9999996981562516, 0.9999993829747845]
 [0.9999999936276822, 0.9999999872013148]
 [0.9999999936276822, 0.9999999872013148]

If one wants to wrap L-BFGS-B using its own convergence checks, creating an Optimization.jl wrapper might make more sense, since it's designed to wrap optimization packages without controlling their behavior.

The main complications I ran into in implementation are:

L-BFGS-B is a constrained optimizer, but Optim types distinguish between AbstractConstrainedOptimizer and FirstOrderOptimizer. I went with FirstOrderOptimizer.

To support Optim's API for specifying bounds, we would want something like

function Optim.optimize(
  df::NLSolversBase.OnceDifferentiable,
  l::Union{Number,AbstractVector},
  u::Union{Number,AbstractVector},
  initial_x::AbstractVector,
  F::Optim.Fminbox{LBFGSB{Nothing,Nothing}},  # no bounds specified
  options::Optim.Options = Optim.Options()
)
  optimizer = LFBSGB(; m=F.method.m, lower=l, upper=u)
  return Optim.optimize(df, initial_x, optimizer, options)
end

I haven't tested this though and am not certain if it would catch all cases where a user specified bounds.

Optim expects its first-order optimizers have a manifold property and have states with x_previous, g_previous, and fx_previous properties but doesn't document these requirements.
One must implement Optim.trace! for a new solver, but this isn't documented, and this function does not seem to be part of the API.

pkofod commented 6 months ago

Oh, okay. Thank you for all the work you did looking into it. So you are relying directly on L_BFGS_B_jll. Okay, I'm thinking if it should I've in Optim proper or be a package extension. The reason is, that some people may expect any Optim optimizer to Julia to the core (it is in the tagline after all), but I can certainly see that it may be useful to wrap in the Optim interface if people are already in Optim and want to try LBFGSB. I actually don't think LBFGSB is super hard to code up, but it's just a bandwidth issue to be honest.

sethaxen commented 6 months ago

Oh, okay. Thank you for all the work you did looking into it.

No problem! Was actually pretty straightforward.

Okay, I'm thinking if it should I've in Optim proper or be a package extension. The reason is, that some people may expect any Optim optimizer to Julia to the core (it is in the tagline after all), but I can certainly see that it may be useful to wrap in the Optim interface if people are already in Optim and want to try LBFGSB. I actually don't think LBFGSB is super hard to code up, but it's just a bandwidth issue to be honest.

I agree it's not ideal that it's not pure Julia. So e.g. it won't support complex numbers or matrices like LBFGS does (although I guess this could be done with some work).

I implemented an algorithm from a paper that uses L-BFGS-B without constraints. When I compare results with LBFGS(; linesearch=MoreThuente()), which I expected to be the same as L-BFGS-B, LBFGSB() performs significantly better. I haven't been able to find a combination of parameters for LBFGS that produces the same traces as LBFGSB, so I assume they're doing something meaningfully different. I'd of course prefer a pure Julia implementation, so long as it can be made equivalent, but unfortunately I also don't have the bandwidth to dig into why they're different.

I guess my preference for now is for LBFGSB to live outside of Optim, but maybe be in the same org and linked to in the docs. An extension might be a little awkward, because loading it would require a user manually loading L_BFGS_B_jll. Then when someone is able to add a pure Julia LBFGSB to Optim, they could check against this implementation for consistency.

sethaxen commented 6 months ago

@pkofod Any further thoughts on this? If we keep OptimLBFGSB.jl, then I would finish the tests and register it.

sethaxen commented 6 months ago

I'll go ahead and write the tests and register OptimLBFGSB.jl.

JuliaNLSolvers / Optim.jl

[FEATURE REQUEST] LBFGSB via LBFGSB.jl #927