embotech / ecos

A lightweight conic solver for second-order cone programming.
GNU General Public License v3.0
479 stars 123 forks source link

Bail out on NaN dead end #181

Closed lloda closed 5 years ago

lloda commented 5 years ago

Sometimes ECOS stops progressing because it hits a NaN, then wastes time until maxit is reached. This patch aborts the loop when that happens.


This change is Reviewable

coveralls commented 5 years ago

Coverage Status

Coverage remained the same at 0.0% when pulling 0d905a3b2ed5fdc0b6258c49c21dba2a28414197 on lloda:extra into 53db68793e020774fc14e8b3bee608b338c85fbd on embotech:develop.

smerkli commented 5 years ago

Hello lloda, thanks for the contribution! Do you happen to have a simple test case where this occurs so we could add it to the tests?

Also note that by submitting contributions, you agree with the contributor license agreement (https://github.com/embotech/ecos/blob/develop/CONTRIBUTING.md).

lloda commented 5 years ago

I'll need a few days to isolate a case. No problem re: the license agreement.

lloda commented 5 years ago

The smallest test I can produce is something like 500k of data with the arguments for

pwork* w = ECOS_setup(idxint n, idxint m, idxint p, idxint l, idxint ncones, idxint* q, idxint nex,
                   pfloat* Gpr, idxint* Gjc, idxint* Gir,
                   pfloat* Apr, idxint* Ajc, idxint* Air,
                   pfloat* c, pfloat* h, pfloat* b);

Arguments q Gpr Gjc Gir Apr Ajc Air c h b are in this hdf5 file:

ecos-nan.h5.zip

The other arguments are

n = 9
m = 3028
p = 1
l = 0
ncones = 836
nex = 0

also

w->stgs->feastol = 1e-12;
w->stgs->reltol = 1e-12;
w->stgs->maxit = 100;

int ec = ECOS_solve(w);

The output of this (with the patch) is

ECOS 2.0.7 - (C) embotech GmbH, Zurich Switzerland, 2012-15. Web: www.embotech.com/ECOS

It     pcost       dcost      gap   pres   dres    k/t    mu     step   sigma     IR    |   BT
 0  +0.000e+00  -1.204e+01  +9e+02  1e+00  2e+01  1e+00  1e+00    ---    ---    1  1  - |  -  -
 1  +9.764e-03  -8.617e-01  +8e+02  1e-01  2e+00  8e-01  1e+00  0.2010  7e-01   1  1  1 |  0  0
 2  +6.061e-02  -1.149e-01  +2e+02  1e-02  6e-01  3e-02  3e-01  0.7908  8e-02   1  1  1 |  0  0
 3  +1.945e-02  -9.473e-03  +7e+01  2e-03  1e-01  7e-03  1e-01  0.7090  1e-01   1  1  1 |  0  0
 4  +6.805e-03  -3.267e-03  +3e+01  6e-04  5e-02  3e-03  5e-02  0.5480  4e-02   1  1  1 |  0  0
 5  +2.587e-03  -1.010e-03  +1e+01  2e-04  2e-02  5e-04  2e-02  0.8346  2e-01   1  1  1 |  0  0
 6  +1.514e-03  +6.949e-04  +3e+00  4e-05  4e-03  1e-04  4e-03  0.7820  2e-02   1  1  1 |  0  0
 7  +1.447e-03  +1.240e-03  +8e-01  8e-06  6e-04  2e-06  9e-04  0.9890  2e-01   1  1  1 |  0  0
 8  +1.425e-03  +1.361e-03  +2e-01  3e-06  1e-04  6e-07  3e-04  0.7710  1e-01   1  1  1 |  0  0
 9  +1.427e-03  +1.365e-03  +2e-01  2e-06  9e-05  5e-07  3e-04  0.1312  8e-01   1  1  1 |  0  0
10  +1.433e-03  +1.411e-03  +8e-02  9e-07  2e-05  5e-08  9e-05  0.9482  3e-01   1  1  1 |  0  0
11  +1.432e-03  +1.430e-03  +8e-03  9e-08  2e-06  5e-09  9e-06  0.9041  5e-03   1  1  1 |  0  0
12  +1.432e-03  +1.431e-03  +2e-03  2e-08  3e-07  1e-09  2e-06  0.8288  6e-02   1  1  2 |  0  0
13  +1.432e-03  +1.432e-03  +4e-04  5e-09  7e-08  3e-10  5e-07  0.7899  4e-02   1  1  2 |  0  0
14  +1.432e-03  +1.432e-03  +4e-05  5e-10  5e-09  3e-11  5e-08  0.9544  3e-02   1  1  1 |  0  0
15  +1.432e-03  +1.432e-03  +5e-06  8e-10  7e-10  4e-12  6e-09  0.8963  7e-03   2  1  1 |  0  0
16  +1.432e-03  +1.432e-03  +2e-06  1e-09  3e-10  2e-12  2e-09  0.6520  8e-02   1  2  2 |  0  0
17  +1.432e-03  +1.432e-03  +1e-05  2e-10  2e-10  7e-13  1e-08  0.9448  8e-01   2  2  1 |  0  0
18  +1.432e-03  +1.432e-03  +2e-06  4e-10  1e-11  3e-13  2e-09  0.9890  5e-02   2  1  1 |  0  0
19  +1.432e-03  +1.432e-03  +7e-08  3e-10  2e-12  4e-14  9e-11  0.9233  1e-02   1  2  1 |  0  0
20  +1.432e-03  +1.432e-03  +2e-08  2e-11  1e-13  3e-15  2e-11  0.9890  5e-02   2  2  1 |  0  0
21   +nan   -nan  -nan  nan  nan  -nan  -nan  0.9890  1e-04   0  0  0 |  0  0
Reached NAN dead end, recovering best iterate (20) and stopping.

Without the patch, the row of NaNs repeats 80 times until maxit is reached, then the same iterate (20) is returned.

I'm sorry this isn't a self contained test, I've had to fish it from the bowels of a large program. I can write a C stub that loads the hdf5 and calls ECOS as above, but I figured that wouldn't be of much help.

smerkli commented 5 years ago

Thanks for isolating the test case - this will be useful for figuring out why NaNs appear in the first place (they should not). In the meantime, I'll try and reproduce things locally and merge the PR once I've done that.

smerkli commented 5 years ago

Looks good to me, could reproduce the fix on my machine. Thanks again!