JuliaIntervals / IntervalRootFinding.jl

Library for finding the roots of a function using interval arithmetic
https://juliaintervals.github.io/IntervalRootFinding.jl/
Other
129 stars 26 forks source link

Failure to find solutions for a 10-dimensional problem #108

Open dpsanders opened 5 years ago

dpsanders commented 5 years ago

Problem taken from https://discourse.julialang.org/t/solvers-fail-on-nonlinear-problem-which-has-solutions/20051:

function f(X, p)
    x1,x2,x3,x4,x5,x6,x7,x8,x9,x10 = X

    return SVector(

        -p[17] * x1 + -2 * p[1] * (x1 ^ 2 / 2) + 2 * p[2] * x2 + p[21] * p[18] * ((1 + p[19] * x7) / (p[20] + x7)),
        -p[17] * x2 + p[1] * (x1 ^ 2 / 2) + -1 * p[2] * x2 + -1 * p[4] * x2 * x4 + p[9] * x3 + p[14] * x3 + -1 * p[6] * x2 * x7 + p[11] * x8,
        -p[17] * x3 + p[4] * x2 * x4 + -1 * p[9] * x3 + -1 * p[5] * x3 * x4 + p[10] * x5 + -1 * p[14] * x3 + p[15] * x5 + p[7] * x8 * x4 + -1 * p[12] * x3 * x7,
        -p[17] * x4 + -1 * p[4] * x2 * x4 + p[9] * x3 + -1 * p[5] * x3 * x4 + p[10] * x5 + -1 * p[7] * x8 * x4 + p[12] * x3 * x7 + p[16] * x9 + p[22] * p[18] * ((1 + p[19] * x7) / (p[20] + x7)),
        -p[17] * x5 + p[5] * x3 * x4 + -1 * p[10] * x5 + -1 * p[15] * x5,
        -p[17] * x6 + p[14] * x3 + p[15] * x5 + -1 * p[8] * x6 * x10 + p[13] * x9,
        -p[17] * x7 + -1 * p[6] * x2 * x7 + p[11] * x8 + p[7] * x8 * x4 + -1 * p[12] * x3 * x7 + p[18] * ((1 + p[19] * x7) / (p[20] + x7)),
        -p[17] * x8 + p[6] * x2 * x7 + -1 * p[11] * x8 + -1 * p[7] * x8 * x4 + p[12] * x3 * x7,
        -p[17] * x9 + p[8] * x6 * x10 + -1 * p[13] * x9 + -1 * p[16] * x9,
        x10 + x9 - p[23]
        )

end

It is stated there that there is always a unique solution for all x's positive, but

X = IntervalBox(0..100, 10)
p = [3600, 18, 18, 3600, 3600, 3600, 1800, 3600, 18, 18, 18, 1800, 18, 36, 11, 180, 0.7, 0.4, 30, 0.2, 4, 4.5, 0.4]

roots(X->f(X, p), X, Newton, 1e-1)

reports an empty solution list. With Krawczyk it took too long.

dpsanders commented 5 years ago

An / the exact solution is close to

x_exact = [0.11584484298713685, 
  0.9455230206055336,   
  1.4935460680512724,   
  0.014733756971650408, 
  2.6673387628301497,   
 15.827943289607886,    
  0.03777227940579977,  
  5.088785613357363,    
  0.3986099863617728,   
  0.0013900136199035832
]

(found by running the Ipopt solver as described in the post).

And indeed, looking close to that solution:

xx = IntervalBox( (x_exact .± 1e-5)... )

roots(X->f(X, p), xx, Krawczyk, 1e-10)

gives a unique root:

1-element Array{Root{IntervalBox{10,Float64}},1}:
 Root([0.115844, 0.115845] × [0.945522, 0.945523] × [1.49354, 1.49355] × [0.0147337, 0.0147338] × [2.66733, 2.66734] × [15.8279, 15.828] × [0.0377722, 0.0377723] × [5.08878, 5.08879] × [0.398609, 0.39861] × [0.00139001, 0.00139002], :unique)
dpsanders commented 5 years ago

(But Newton hangs)

Kolaru commented 5 years ago

It has probably something to do with the fact that the determinant of Jacobi matrix at the solution is close to zero:

julia> import ForwardDiff: jacobian

julia> jac(xx) = jacobian(x -> f(x, p), xx)
jac (generic function with 1 method)

julia> using LinearAlgebra

julia> det(jac(x))
-4.192183306147442e18

Together with the 10 dimensions, this makes the matrix inversion we use very unstable.

The jacobian does not seem to be singular, so the problem must be elsewhere.

Note that the result is the same using #93, so it is not due to anything fixed there.

dpsanders commented 5 years ago

But that Jacobian means that the matrix is far from singular. What do you mean by "close to 0"?

Kolaru commented 5 years ago

I edited, I meant "the determinant of the Jacobi matrix is close to zero", but I misread the exponent as "e-18" and not "e18", my bad so the problem must be elsewhere.

Kolaru commented 5 years ago

I may have got it this time.

In this problem the determinant of the intervalled value Jacobi matrix often contains 0. Since the hypothesis for the convergence of Newton method require that it is not the case so it is likely the culprit.

However in most case det(J) containing 0 results in a contracted interval that covers the whole space (it is certainly true in one dimension), and is therefore always bisected.

Here the 9 first components of the first contracted interval (IntervalBox(0..100, 10) in the example) are -inf..inf, but the tenth give some irrelevant number (about -50). As a consequence the intersection between the two is empty and the algorithm conclude that there is no solution.

To fix it we could explicitely check the determinant of the Jacobi matrix (safest but may be very expensive), however checking that all components of the contracted interval are finite should be sufficient.

Then as expected for at 10 dimensional problem, as the code is not really design for such high dimension, the performance are not great.

dpsanders commented 5 years ago

Thanks, that sounds like a good idea.

Probably we should switch to using interval Gauss--Seidel iteration instead of trying to solve the system using Gaussian elimination via \.

Krawczyk, however, does not invert an interval matrix, so I'm not sure why that doesn't work either.

I agree that performance is expected to be slow for a higher-dimensional problem. We should add in constraint propagation to speed it up.

Kolaru commented 5 years ago

\ is defined for static matrices of Interval (in linear_eq.jl) but yet it seems the StaticArrays.jl method is used, something must be wrong with the typing of the function.

I thing that Krawczyk is working but is simply too slow.

Then the problem is very weird and highly depends on the starting interval, as a consequence I have been unable yet to reproduce that kind of problem in lower dimension (which would be interesting to verify that the fix actually fix it).

dpsanders commented 5 years ago

I think part of the problem here is the vast number of intervals that are obtained by bisection before many can be thrown away.

Kolaru commented 5 years ago

Apparently there are ways of speeding things up:

https://link.springer.com/article/10.1023/A:1008359223042

One of the problem (for which a solution is given in the paper) is that bisecting along the larger dimension is not always optimal. For example, consider the functions

f(x, y) = SVector(sin(x), cos(y))
Xf = IntervalBox(-10..10, -10..10)  # Search region

g(x, y) = SVector(sin(1000*x), cos(y/1000))
Xg = IntervalBox(-0.01..0.01, -10000..10000)

The two problems are the same (up to rescaling), but in the second case the algorithm will keep bisecting the second dimension because the box is bigger along that direction. Doing so will however not help much.

Doing multisections instead of bisection seems to help a bit too (that's actually the main point of the article above).

dpsanders commented 4 months ago

We need to incorporate contractors from IntervalConstraintProgramming for this.

Usage:


using Revise, IntervalArithmetic, IntervalArithmetic.Symbols, IntervalBoxes
using IntervalConstraintProgramming
using Symbolics

X = IntervalBox(0..100, 10)
orig_X = X

p = [3600, 18, 18, 3600, 3600, 3600, 1800, 3600, 18, 18, 18, 1800, 18, 36, 11, 180, 0.7, 0.4, 30, 0.2, 4, 4.5, 0.4]

vars = @variables x1, x2, x3, x4, x5, x6, x7, x8, x9, x10

lhs = [ -p[17] * x1 + -2 * p[1] * (x1 ^ 2 / 2) + 2 * p[2] * x2 + p[21] * p[18] * ((1 + p[19] * x7) / (p[20] + x7)),
               -p[17] * x2 + p[1] * (x1 ^ 2 / 2) + -1 * p[2] * x2 + -1 * p[4] * x2 * x4 + p[9] * x3 + p[14] * x3 + -1 * p[6] * x2 * x7 + p[11] * x8,
               -p[17] * x3 + p[4] * x2 * x4 + -1 * p[9] * x3 + -1 * p[5] * x3 * x4 + p[10] * x5 + -1 * p[14] * x3 + p[15] * x5 + p[7] * x8 * x4 + -1 * p[12] * x3 * x7,
               -p[17] * x4 + -1 * p[4] * x2 * x4 + p[9] * x3 + -1 * p[5] * x3 * x4 + p[10] * x5 + -1 * p[7] * x8 * x4 + p[12] * x3 * x7 + p[16] * x9 + p[22] * p[18] * ((1 + p[19] * x7) / (p[20] + x7)),
               -p[17] * x5 + p[5] * x3 * x4 + -1 * p[10] * x5 + -1 * p[15] * x5,
               -p[17] * x6 + p[14] * x3 + p[15] * x5 + -1 * p[8] * x6 * x10 + p[13] * x9,
               -p[17] * x7 + -1 * p[6] * x2 * x7 + p[11] * x8 + p[7] * x8 * x4 + -1 * p[12] * x3 * x7 + p[18] * ((1 + p[19] * x7) / (p[20] + x7)),
               -p[17] * x8 + p[6] * x2 * x7 + -1 * p[11] * x8 + -1 * p[7] * x8 * x4 + p[12] * x3 * x7,
               -p[17] * x9 + p[8] * x6 * x10 + -1 * p[13] * x9 + -1 * p[16] * x9,
               x10 + x9 - p[23]]

constraints = lhs .== 0

Ss = Separator.(constraints, Ref(vars))

S = reduce(⊓, Ss)

pave(X, S, 10)

The result is a set of boxes that contain the roots.

We need to combine interval Newton / Krawczyk with applying the contractor as much as possible.

dpsanders commented 4 months ago

In particular, applying a single time gives

julia> S(X)[2]
[0.0, 2.76928] × [0.0, 100.0] × [0.0, 100.0] × [0.0, 100.0] × [0.0, 100.0] × [0.0, 100.0] × [0.0, 100.0] × [0.0, 100.0] × [0.0, 0.400001] × [0.0, 0.400001]

which is a box that must contain the roots.

Further:

julia> @time inner, boundary = pave(X, S, 10);
  1.233470 seconds (16 allocations: 7.569 MiB)

julia> reduce(hull, (inner; boundary))
[0.0, 1.0012] × [0.0, 100.0] × [0.0, 12.2094] × [0.0, 100.0] × [0.0, 24.611] × [0.0, 100.0] × [0.0, 100.0] × [0.0, 100.0] × [0.0, 0.400001] × [0.0, 0.400001]

gives a smaller guaranteed box.

But reducing the tolerance down further takes exponentially longer...

dpsanders commented 4 months ago
julia> @time inner, boundary = pave(X, S, 2.5);
 78.992037 seconds (5.71 k allocations: 271.611 MiB, 0.02% gc time, 0.02% compilation time)

julia> reduce(hull, (inner; boundary))
[0.00670103, 1.00117] × [0.0, 100.0] × [0.0, 8.38375] × [0.0, 100.0] × [0.0, 15.2615] × [0.0, 100.0] × [0.0, 100.0] × [0.0, 100.0] × [0.0, 0.400001] × [0.0, 0.400001]