Linesearch + Merit functions

abelsiqueira commented 3 years ago

Building upon #145, now changing line search to use the new merit model. I'm gonna change trust region now and hopefully this will lead to a complete picture.

abelsiqueira commented 3 years ago

This is at a good point for discussion. I've introduced merit functions and tested new line search and trust region methods using it. Since it's 25 files long, I'm gonna explain the important parts, and after we decide if the big picture is fine, I can break into probably 3 parts.

Merit models definition

Merit models are focused on objective value and directional derivative. In a sense, they replace LineModels. They assume you have a "dual" part and a "primal" part, where the primal part is usually called the penalty and is multiplied by the penalty parameter. For instance,

L1: dual part is f(x) and primal part is |c(x)|_1.
AugLag: dual part is f(x) + y'c(x) and primal part if |c(x)|^2 / 2.

It's important that merit functions simplify the code without adding much overhead. 4 values are stored: fx and cx, that are updated if obj is called with update=true and gx and Ad, that are updated if derivative is called with update=true. Ideally, the user will use references to the vectors to reuse them in the code. This adds some drawbacks in remembering to update things (or not update them), but it's a price to use merit functions.

Both line search and trust region methods should work with merit functions. Since merit functions store useful information for the algorithm, instead of keeping the merit inside these methods, the merit model is just an argument to these methods.

If the problem is unconstrained there is a UncMerit to wrap the problem and pass to the step control algorithms.

Should the merit model be an NLPModel: https://github.com/JuliaSmoothOptimizers/SolverTools.jl/pull/146/files#diff-1a6f6d5cc4d70018c9ed5df46e1d4d70R4

I don't have strong arguments either way.
Advantages:
- some models can be treated as (sub)problems and solved, e.g., AugLagMerit instead of AugLagModel.
Disadvantages:
- adds a little bit of complexity, although not much.

Line search rewrite

Not much was changed on the line search methods. Instead of using LineModels, we use merit models. Also, we unify the input and output (as with JSOSolvers). The existing method armijo_wolfe was barely changed, and another method armijo, which is cheaper, was created.

Here is the entry point: https://github.com/JuliaSmoothOptimizers/SolverTools.jl/pull/146/files#diff-8ebc45751f7e9da3deb4450a5907c89aR12

dummy_linesearch_solver was created using it: https://github.com/JuliaSmoothOptimizers/SolverTools.jl/pull/146/files#diff-2c545ed6ccd10ff35efe810c5d85e4f9R56

Trust region rewrite

The existing trust region strategy defines a struct and several methods. Instead, the new strategy is to use a single function call as in the line search method. The function assumes that the merit has information at the current x, or that is will compute that information and also that it will compute the objective at the trial point xt.

All update strategies I found use rho = ared / pred to update the radius and accept or deny the step. ared = phi(x) - phi(x + d) is simple to obtain but pred = m(x) - m(x + d) is not, because it depends on the model. So the implementation assumes that m(x) = phi(x) and m(x + d) is dual(x) + delta + eta * P(c(x) + Ad), i.e., it assumes that the "dual" part of the model is the dual part of the merit added to a variation that is computed outside the algorithm; and the "primal" part is the penalization function applied to the linear approximation to c(x + d). This is a limitation of the merit function, but I'm not sure how much useful would it be to try to be more general, so we'll have to wait for requests.

Unlike line search, everything was rewritten. The update logic is still there for each method, but even the parameter names were changed.

Here is the entry point: https://github.com/JuliaSmoothOptimizers/SolverTools.jl/pull/146/files#diff-d02c78ebe1ccf3e3acbf63be3dc9d191R22

dummy_trust_region_solver was created using it: https://github.com/JuliaSmoothOptimizers/SolverTools.jl/pull/146/files#diff-61da26b3dc4a7f3b3cba1110cf825eabR59

JSOSolvers

The immediate effect of these changes is JSOSolvers. I don't think any other method is using trust regions and line searches from SolverTools at the moment, please correct me if I'm wrong.

To make sure that the above works and doesn't slow down (too much), I'm thinking that updating lbfgs and trunk, running them and comparing both strategies should be enough. I'll update the results when I get them.

abelsiqueira commented 3 years ago

Tests were made: https://github.com/JuliaSmoothOptimizers/JSOSolvers.jl/pull/43

JuliaSmoothOptimizers / SolverTools.jl