Affine relaxation vs. natural interval extension and numerical issues

ShuhuaGao commented 3 years ago

Hi, @mewilhel, this issue follows the previous one #3, but I think it is more appropriate to open two issues.

I am new to McCormick relaxation and learn the basics from the paper [1], especially the affine relaxation with the subgradient. I also read your paper on EAGO [2]. I played with the notebook demonstrating customized routines in EAGO, since I have a similar parameter identification problem, i.e., a nonlinear least-squares problem with only box constraints. I have however some unexpected findings and would like to consult you for more details. The following discussion is constrained to box constrained problems only. The modified notebook can be found here.

1. Affine relaxation in lower_problem! In lower_problem! of your original notebook, the lower bounding is underestimated by the natural interval extension. However, it is often stated in the literature (e.g., [1, 2]) that the natural interval extension is not as tight as McCormick relaxation. I thus examined the simplest single-point affine relaxation in lower_problem! (thanks for the excellent McCormick.jl package). Additionally, I also check which one is a tighter underestimation. The code is pasted below (hope nothing wrong).

import EAGO.lower_problem!
function lower_problem!(t::IntervalExt, x::EAGO.Optimizer)

    # retrieve bounds at current node
    n = x._current_node
    lower = n.lower_variable_bounds
    upper = n.upper_variable_bounds

    # McCormick relaxation
    box = Interval.(lower, upper)
    xmid = mid.(box)
    xMC = [MC{4, NS}(xmid[i], box[i], i) for i =1:4]
    fMC = obj_func(xMC)

    # affine relaxation at `xmid`
    s = fMC.cv_grad
    xopt = similar(lower)  # store minimum solution to the affine relaxation
    @inbounds for (i, ele) in enumerate(s)
        xopt[i] = ele > 0 ? lower[i] : upper[i]
    end
    fopt = fMC.cv + s ⋅ (xopt .- xmid)

    # report lower bounding results
    x._lower_objective_value = fopt
    x._lower_solution = xopt
    x._lower_feasibility = true
    x._cut_add_flag = false

    # can we get a tigher lower bound than natural interval extension?
    itv_nat = fMC.Intv
    if fopt > lo(itv_nat)
        @show (lo(itv_nat), fopt, n.depth, n.id)
    end
    return
end

The above obj_func is just obj_func(x) = sin(x[1])*x[2]^2 - cos(x[3])/x[4] for simplicity.

It is surprising that fopt > lo(itv_nat) never becomes true, which means the natural interval extension is always tighter for lower bounding. However, in Section 3.6 of [2], it says

For nonlinear expressions, an affine relaxation is generated via an affine approximation of the expression at the midpoint of the domain using subgradient information.

Since I am new to this field, does it often happen that the affine approximate is even worse than natural extension? If so, we may combine the two and fetch the smaller one (I am not sure whether EAGO has already done it).

In addition, is it worth attempting multi-point affine relaxation in contrast to the single midpoint as done above? For constrained problems, multi-point affine relaxation leads to a larger LP problem; for non-constrained problems as we discuss here, the minimum value of the relaxation is min max g_i(x) where g_i is an affine function. There exists faster algorithm than LP to solve this minimization.

Though there happens to be always fopt <= lo(itv_nat), it actually took less iterations than the original method using natural interval extension (557 vs. 734), which seems strange to me. A tighter lower bounding is expected to reduce rather than increase the number of iterations in branch and bound, right? Is it due to the branching mechanism in EAGO? (since the domain reduction techniques have been disabled and the upper bounding method is the same)

2. Inspection of the obtained global optimal solutions

In #3, I noticed that there is discrimination between the reported result and the actual one. This issue still exists here:

The optimal value is:    -1.498837828202948, the solution found is [-1.573774954304099, 0.9998533502221107, -0.006667515262961388, 2.003519594669342].
Verify obj value @ xsol: -1.4988128398246658

The digits differ after 1e-5 as #3.

The above observations may, unfortunately, imply that EAGO has numerical issues somewhere, which asks for improvement.

3. Finally, as stated in [1], the McCormick based relaxation is particularly suitable for a parameter identification problem (like the example in Section 5.1 of [1]), where the number of variables is small but the number of factors can be large due to multiple data points. For such an unconstrained NLP, do you have suggestions on the lower_problem! based on your experience? I have not tested the multi-point affine relaxation yet.

[1] Mitsos, Alexander, Benoît Chachuat, and Paul I. Barton. "McCormick-based relaxations of algorithms." SIAM Journal on Optimization 20.2 (2009): 573-601.

[2] Wilhelm, M. E., and M. D. Stuber. "EAGO. jl: easy advanced global optimization in Julia." Optimization Methods and Software (2020): 1-26.

mewilhel commented 3 years ago

This can certainly happen. While the McCormick relaxation are at least as tight as interval bounds the affine relaxations generated from McCormick relaxations can certainly be weaker and may always be so for some problems. One of the simplest examples is f(x) = x^2. Take the affine relaxation, compute it's lower bound, and as long as it isn't generated from the relaxation at 0 it will be worse than the interval bounds. This happens when interval bounds are relatively tight. In contrast, take the function sum(x - x for i in 1:100000), the interval bounds for this will be highly expansive whereas the affine relaxation is exact. Repeated use of multiplication, division, and subtraction operators generally lead to much weaker Interval bounds (and a corresponding relative improvement for affine relaxations). So generally, you see more improvement on more complex problems. Also, I expect that the branching behavior is responsible for the improvement despite the lack of improvement in bounds.
I think this is related to the issue in #3. So, I'll leave the discussion to that thread.
Generally, multi-point relaxations work fairly well. We use a variant for the default solver that seems to be effective where we repeatedly add affine relaxations until they stop improving the solution (or hit a limit) and then choose the better of the affine and interval bounds. The new affine relaxations are generated at the solution of the prior value offset toward the midpoint by a factor. If you can formulate your problem as a JuMP NL expression such as done in the https://github.com/PSORLab/EAGO-notebooks/blob/master/notebooks/nlpopt_explicit_kinetic.ipynb example then you can try this out fairly readily. Also, it's worth trying an epigraph reformulation of the objective function into a constraint. In some cases, this allows for a better application of multiple affine relaxations. There are also some more specific approaches based on subgradient-tightening of interval bounds that we can try depending on the problem structure. I’d be happy to look over a specific problem formulation and provided some feedback/ideas of how to address it.

ShuhuaGao commented 3 years ago

Thank you for the detailed explanation @mewilhel . I have got a nonlinear least-squares fitting problem of 5 parameters. With default settings (I did not provide any custom routines), EAGO yields

First Solution Found at Node 1
LBD = 0.0
UBD = 2.527822301517424e-5

The UBD is essentially the global minimum (validated with other global solvers and metaheuristics). However, the LBD is generally useless, since by definition the objective is nonnegative. I attempted to get a better LBD by decreasing the absolute_tolerance. Once I set absolute_tolerance=1e-5, EAGO threw an error during optimization

MethodError: no method matching lower_interval_bound(::EAGO.BufferedNonlinearFunction{MC{5,NS}})

By contrast, another solver Couenne (integrated into JuMP by the wrapper AmplNLWriter.jl) outputs LBD=2.52782e-05 and UBD=2.52782e-05 in less than 2 seconds (with their default settings). Even so, I prefer EAGO because the modular design is easy to understand and open for extension and customization. Besides, it is written in Julia 😊

To facilitate reproduction, I have attached the (zipped) Jupyter notebook using EAGO for your information.

eago-PVID.zip

mewilhel commented 3 years ago

Thanks for the files and the description. I'll dive into the encountered issue this afternoon and I'll look into fixing that error shortly.

PSORLab / EAGO-notebooks

Affine relaxation vs. natural interval extension and numerical issues #4