Factorization failed for sparse SDP and a NameError.

Viech commented 4 years ago

The following script:

#!/usr/bin/python

import pickle
from sys import version

import cvxopt
import smcp

with open("picos_issue_201.pickle", "rb") as f:
    c, G, h, dims = pickle.load(f)

print("Python", version.splitlines()[0])
print("CVXOPT", cvxopt.__version__)
print("SMCP", smcp.__version__)
print("c", repr(c))
print("G", repr(G))
print("h", repr(h))
print("dims", dims)

# Works.
print("="*80)
cvxopt.solvers.conelp(c, G, h, dims)

# Factorization fails.
print("="*80)
smcp.solvers.conelp(c, G, h, dims)

# NameError.
print("="*80)
smcp.solvers.conelp(c, G, h, dims, kktsolver="ldl")

Produces the output:

Python 3.8.1 (default, Jan 22 2020, 06:38:00) 
CVXOPT 1.2.4
SMCP 0.4.6
c <164x1 matrix, tc='d'>
G <341x164 sparse matrix, tc='d', nnz=605>
h <341x1 matrix, tc='d'>
dims {'l': 53, 'q': [], 's': [12, 12]}
================================================================================
     pcost       dcost       gap    pres   dres   k/t
 0: -1.6569e-17  2.2204e-16  2e+02  9e+00  2e+00  1e+00
 1:  6.5233e-01  1.3550e+00  9e+00  8e-01  2e-01  8e-01
 2:  3.0787e+00  3.6953e+00  8e+00  4e-01  9e-02  7e-01
 3:  4.4507e+00  4.5155e+00  1e+00  6e-02  1e-02  7e-02
 4:  4.6577e+00  4.6609e+00  6e-02  3e-03  6e-04  3e-03
 5:  4.6672e+00  4.6673e+00  2e-03  7e-05  2e-05  4e-05
 6:  4.6675e+00  4.6675e+00  3e-05  1e-06  3e-07  6e-07
 7:  4.6675e+00  4.6675e+00  1e-06  5e-08  1e-08  2e-08
Optimal solution found.
================================================================================
0.4.6                Extended self-dual embedding, primal scaling (Cholesky)
----------------------------------------------------------------------------
SDP var. size:       77 
Constraints:         164 (7|157)
Aggregate sparsity:  Chordal        NNZ(tril(V)) =     209
----------------------------------------------------------------------------
 it  pcost       dcost      gap     pres    dres    k/t     step    cputime
  0  0.0000e+00  0.0000e+00 7.7e+01 1.2e+00 6.3e+00 1.0e+00             0.0
  1 -4.3589e-01 -3.1220e-01 3.9e+01 5.2e-01 2.8e+00 5.7e-01 7.0e-01     0.0
  2 -9.6490e-01 -8.7575e-01 2.4e+01 3.1e-01 1.7e+00 3.6e-01 4.9e-01     0.0
  3 -1.5746e+00 -1.5117e+00 1.4e+01 1.7e-01 9.1e-01 2.1e-01 7.0e-01     0.1
  4 -2.5677e+00 -2.5056e+00 9.0e+00 9.0e-02 4.9e-01 1.4e-01 1.0e+00     0.1
  5 -3.6033e+00 -3.5493e+00 5.2e+00 4.4e-02 2.4e-01 9.2e-02 7.0e-01     0.1
  6 -4.2693e+00 -4.2416e+00 2.1e+00 1.6e-02 8.7e-02 4.1e-02 7.0e-01     0.1
  7 -4.5307e+00 -4.5200e+00 7.4e-01 5.5e-03 3.0e-02 1.5e-02 7.0e-01     0.2
  8 -4.6231e+00 -4.6195e+00 2.5e-01 1.8e-03 9.8e-03 5.1e-03 7.0e-01     0.2
  9 -4.6530e+00 -4.6519e+00 8.2e-02 5.9e-04 3.2e-03 1.7e-03 7.0e-01     0.2
 10 -4.6628e+00 -4.6624e+00 2.7e-02 1.9e-04 1.0e-03 5.3e-04 7.0e-01     0.2
 11 -4.6660e+00 -4.6659e+00 8.6e-03 6.2e-05 3.4e-04 1.7e-04 7.0e-01     0.2
 12 -4.6670e+00 -4.6670e+00 2.8e-03 2.0e-05 1.1e-04 5.3e-05 7.0e-01     0.3
 13 -4.6673e+00 -4.6673e+00 9.1e-04 6.6e-06 3.6e-05 1.7e-05 7.0e-01     0.3
 14 -4.6674e+00 -4.6674e+00 2.9e-04 2.1e-06 1.2e-05 5.4e-06 7.0e-01     0.3
 15 -4.6675e+00 -4.6675e+00 9.6e-05 7.0e-07 3.8e-06 1.7e-06 7.0e-01     0.3
 16 -4.6675e+00 -4.6675e+00 3.1e-05 2.3e-07 1.2e-06 5.4e-07 7.0e-01     0.3
 17 -4.6675e+00 -4.6675e+00 1.0e-05 7.4e-08 4.0e-07 1.7e-07 7.0e-01     0.4
 18 -4.6675e+00 -4.6675e+00 3.3e-06 2.4e-08 1.3e-07 5.5e-08 7.0e-01     0.4
*** Factorization failed
   Primal objective:                -4.66748431e+00
   Dual objective:                  -4.66748427e+00
   Gap:                              3.26966149e-06
   Relative gap:                     7.00519010e-07
   Primal infeasibility:             2.39690612e-08
   Dual infeasibility:               1.30294140e-07
   Iterations:                       18
   CPU time:                         0.39
   CPU time per iteration:           0.02
   Real time:                        0.39
   Real time per iteration:          0.02

   DIMACS:  2.75e-08 0.00e+00 9.21e-08 0.00e+00 -3.35e-09 3.16e-07
================================================================================
Traceback (most recent call last):
  File "./picos_issue_201.py", line 31, in <module>
    smcp.solvers.conelp(c, G, h, dims, kktsolver="ldl")
  File "/usr/lib/python3.8/site-packages/smcp/solvers.py", line 2368, in conelp
    sol = P.solve_esd(kktsolver=kktsolver)
  File "/usr/lib/python3.8/site-packages/smcp/base.py", line 260, in solve_esd
    return solvers.chordalsolver_esd(self.A,self.b,kktsolver=kktsolver,\
  File "/usr/lib/python3.8/site-packages/smcp/solvers.py", line 2086, in chordalsolver_esd
    print_head()
  File "/usr/lib/python3.8/site-packages/smcp/solvers.py", line 2012, in print_head
    print("%-20s Extended self-dual embedding, %s scaling (%s)" % (__version__,scaling,SolvStr))
NameError: free variable 'SolvStr' referenced before assignment in enclosing scope

I will attach the script and the pickled problem data in a comment, as I do not seem to be able to attach when editing the issue that I accidentially submitted with an empty description.

Viech commented 4 years ago

Apparently I cannot attach this to a comment either even if I put it in an archive. So here are links:

http://dl.viech.name/picos_issue_201.py http://dl.viech.name/picos_issue_201.pickle

% md5sum picos_issue_201.*     
e915abb637117e44a14bbc221f4a8af1  picos_issue_201.pickle
e4d614a1446e84b0ca26778dc951f320  picos_issue_201.py

Please note the security warning in https://docs.python.org/3/library/pickle.html. If you do not want to load pickled data, feel free to instead checkout the dualize2 branch of https://gitlab.com/picos-api/picos and run ./test.py -c power_trace -o dualize=True -s smcp.

On PIOCS end, we track this in https://gitlab.com/picos-api/picos/issues/201.

martinandersen commented 4 years ago

This looks like a problem where the default tolerances are too strict for the default KKT solver.

The NameError arises because kktsolver='ldl' is not a valid choice ('chol' (default) and 'qr' are currently the only valid choices). The following commit "fixes" this issue by throwing a ValueError exception: https://github.com/cvxopt/smcp/commit/86fb5f07d1f67a71efec6dc691caf73a3c852442

Viech commented 4 years ago

I tried with kktsolver="qr" but there is a different issue:

Terminated (small step size detected).
   Primal objective:                -4.66745235e+00
   Dual objective:                  -4.66745235e+00
   Gap:                              7.10227696e-11
   Relative gap:                     1.52166030e-11
   Primal infeasibility:             2.32939129e+03
   Dual infeasibility:               6.86684721e-01
   Iterations:                       76
   CPU time:                         6.76
   CPU time per iteration:           0.09
   Real time:                        6.77
   Real time per iteration:          0.09

martinandersen commented 4 years ago

I get the following output if I change 'ldl' to 'qr':

Optimal solution found.
   Primal objective:                -4.66748488e+00
   Dual objective:                  -4.66748488e+00
   Gap:                              3.56810782e-08
   Relative gap:                     7.64460498e-09
   Primal infeasibility:             7.57010386e-09
   Dual infeasibility:               3.36551174e-09
   Iterations:                       21
   CPU time:                         6.27
   CPU time per iteration:           0.30
   Real time:                        3.17
   Real time per iteration:          0.15

Which version of SMCP and Chompack are you using?

Viech commented 4 years ago

Arch Linux 5.4.15
Python 3.8.1
SMCP 0.4.6
Chompack 2.3.2

Viech commented 4 years ago

Same issue when updating to Chompack 2.3.3.

martinandersen commented 4 years ago

My guess is that it is a numerically challenging problem for SMCP with the default tolerances. SMCP is not as mature/robust as general purpose conic solvers such as CVXOPT's conelp() and MOSEK. The difference may be explained by different BLAS libraries (leading to different round off errors). What happens if you relax the feasibility tolerance?

smcp.solvers.options['feastol'] = 1e-7   # default is 1e-8

Viech commented 4 years ago

My BLAS is 3.9.0. With 1e-7, it works with "qr" but still not with "chol". With 1e-6 it works for both factorizations.

So I guess this is nothing you can easily fix and I should update our testbench to skip some known failures by default. I leave it to you to close or leave the issue open.

Viech commented 4 years ago

Just for the sake of completeness, I noticed that failures like this can happen or not based on the order of variables passed to SMCP. So in the long run, maybe some clever reordering of the constraint matrix can improve stability.

cvxopt / smcp

Factorization failed for sparse SDP and a NameError. #11