ClarabelSolver crashes on this sums of squares instance

What happened?

When I upgraded the drake version in my underactuated notes, one of the examples (the cart-pole example from here) which works fine with Mosek and solves slowly with CDSP now crashes with Clarabel.

Here is the code to reproduce it:

import numpy as np
from pydrake.all import (
    ClarabelSolver,
    CommonSolverOption,
    Expression,
    MathematicalProgram,
    Polynomial,
    SolverOptions,
    Variables,
)
from scipy.integrate import quad

nz = 5
nq = 2
nx = 2 * nq
nu = 1

mc = 10
mp = 1
l = 0.5
g = 9.81

# Map from original state to augmented state.
# Uses sympy to be able to do symbolic integration later on.
# x = (x, theta, xdot, thetadot)
# z = (x, s, c, xdot, thetadot)
x2z = lambda x: np.array([x[0], np.sin(x[1]), np.cos(x[1]), x[2], x[3]])

def T(z, dtype=Expression):
    assert len(z) == nz
    T = np.zeros([nz, nx], dtype=dtype)
    T[0, 0] = 1
    T[1, 1] = z[2]
    T[2, 1] = -z[1]
    T[3, 2] = 1
    T[4, 3] = 1
    return T

def f(z, u, T):
    assert len(z) == nz
    s = z[1]
    c = z[2]
    qdot = z[-nq:]
    denominator = mc + mp * s**2
    f_val = np.zeros(nx, dtype=Expression)
    f_val[:nq] = qdot * denominator
    f_val[2] = (u + mp * s * (l * qdot[1] ** 2 + g * c))[0]
    f_val[3] = ((-u * c - mp * l * qdot[1] ** 2 * c * s - (mc + mp) * g * s) / l)[0]
    return T @ f_val, denominator

def f2(z, T, dtype=Expression):
    assert len(z) == nz
    s = z[1]
    c = z[2]
    f2_val = np.zeros([nx, nu], dtype=dtype)
    f2_val[2, :] = 1 / (mc + mp * s**2)
    f2_val[3, :] = -c / (mc + mp * s**2) / l
    return T @ f2_val

# State limits (region of state space where we approximate the value function).
# State limits (region of state space where we approximate the value function).
d_theta_scale = 1
d_theta = d_theta_scale * np.pi
x_max = np.array([2, np.pi + d_theta, 6, 6])
x_min = np.array([-2, np.pi - d_theta, -6, -6])
u_max = np.array([100])
if d_theta < np.pi / 2:
    z_max = np.array([x_max[0], np.sin(x_min[1]), np.cos(x_min[1]), x_max[2], x_max[3]])
    z_min = np.array([x_min[0], np.sin(x_max[1]), -1, x_min[2], x_min[3]])
else:
    z_max = np.array([x_max[0], 1, np.cos(x_min[1]), x_max[2], x_max[3]])
    z_min = np.array([x_min[0], -1, -1, x_min[2], x_min[3]])
assert (z_min < z_max).all()
x_max_list = list(x_max)
x_max_list[1] = d_theta_scale

d_theta_int = 0.7 * np.pi
x_max_int = np.array([1.5, np.pi + d_theta_int, 4, 4])
x_min_int = np.array([-1.5, np.pi - d_theta_int, -4, -4])
if d_theta_int < np.pi / 2:
    z_max_int = np.array(
        [
            x_max_int[0],
            np.sin(x_min_int[1]),
            np.cos(x_min_int[1]),
            x_max_int[2],
            x_max_int[3],
        ]
    )
    z_min_int = np.array(
        [
            x_min_int[0],
            np.sin(x_max_int[1]),
            -1,
            x_min_int[2],
            x_min_int[3],
        ]
    )
else:
    z_max_int = np.array(
        [x_max_int[0], 1, np.cos(x_min_int[1]), x_max_int[2], x_max_int[3]]
    )
    z_min_int = np.array([x_min_int[0], -1, -1, x_min_int[2], x_min_int[3]])
assert (z_min_int < z_max_int).all()

# Equilibrium point in both the system coordinates.
x0 = np.array([0, np.pi, 0, 0])
z0 = x2z(x0)
z0[np.abs(z0) <= 1e-6] = 0

# Quadratic running cost in augmented state.
Q_diag = [200, 2e3, 2e3, 1e3, 1e3]
Q = np.diag(Q_diag)
R = np.diag([1])

def l_cost(z, u):
    return (z - z0).dot(Q).dot(z - z0) + u.dot(R).dot(u)

Rinv = np.linalg.inv(R)

def calc_u_opt(dJdz, f2, Rinv):
    u_star = -0.5 * Rinv.dot(f2.T).dot(dJdz.T)
    return u_star

def cartpole_sos_lower_bound(deg):
    # Set up optimization.
    prog = MathematicalProgram()
    z = prog.NewIndeterminates(nz, "z")
    u = prog.NewIndeterminates(nu, "u")
    J = prog.NewFreePolynomial(Variables(z), deg)
    J_expr = J.ToExpression()

    # Maximize volume beneath the value function.
    obj = J.Integrate(z[0], z_min_int[0], z_max_int[0])
    for i in range(3, nz):
        obj = obj.Integrate(z[i], z_min_int[i], z_max_int[i])
    cost = 0
    for monomial, coeff in obj.monomial_to_coefficient_map().items():
        s1_deg = monomial.degree(z[1])
        c1_deg = monomial.degree(z[2])
        monomial_int1 = quad(
            lambda x: np.sin(x) ** s1_deg * np.cos(x) ** c1_deg, 0, 2 * np.pi
        )[0]
        if np.abs(monomial_int1) <= 1e-5:
            monomial_int1 = 0
        cost += monomial_int1 * coeff
    poly = Polynomial(cost)
    cost_coeff = [c.Evaluate() for c in poly.monomial_to_coefficient_map().values()]
    # Make the numerics better
    prog.AddLinearCost(-cost / np.max(np.abs(cost_coeff)))

    # Enforce Bellman inequality.
    T_val = T(z)
    f_val, denominator = f(z, u, T_val)
    J_dot = J_expr.Jacobian(z).dot(f_val)
    LHS = J_dot + l_cost(z, u) * denominator

    lam_deg = Polynomial(LHS).TotalDegree()
    # S procedure for s^2 + c^2 = 1.
    lam = prog.NewFreePolynomial(Variables(z), lam_deg).ToExpression()
    S_procedure = lam * (z[1] ** 2 + z[2] ** 2 - 1)
    S_Jdot = 0
    for i in np.arange(nz):
        lam = prog.NewSosPolynomial(Variables(z), int(np.ceil(lam_deg / 2) * 2))[
            0
        ].ToExpression()
        S_Jdot += lam * (z[i] - z_max[i]) * (z[i] - z_min[i])

    # Enforce Input constraint
    u_min = -u_max
    for i in range(nu):
        lam = prog.NewSosPolynomial(Variables(z), int(np.ceil(deg / 2) * 2))[
            0
        ].ToExpression()
        S_Jdot += lam * (u[i] - u_max[i]) * (u[i] - u_min[i])

    prog.AddSosConstraint(LHS + S_procedure + S_Jdot)

    # Enforce that value function is PD
    S_J = 0
    lam_r = prog.NewFreePolynomial(Variables(z), deg).ToExpression()
    S_r = lam_r * (z[1] ** 2 + z[2] ** 2 - 1)
    for i in np.arange(nz):
        lam = prog.NewSosPolynomial(Variables(z), int(np.ceil(deg / 2) * 2))[
            0
        ].ToExpression()
        S_J += lam * (z[i] - z_max[i]) * (z[i] - z_min[i])
    # Enforce that value function is PD
    prog.AddSosConstraint(J_expr + S_J + S_r)

    # J(z0) = 0.
    J0 = J_expr.EvaluatePartial(dict(zip(z, z0)))
    prog.AddLinearConstraint(J0 == 0)

    # Solve and retrieve result.
    options = SolverOptions()
    options.SetOption(CommonSolverOption.kPrintToConsole, 1)
    prog.SetSolverOptions(options)
    solver = ClarabelSolver()
    result = solver.Solve(prog)
    assert result.is_success()
    J_star = Polynomial(result.GetSolution(J_expr)).RemoveTermsWithSmallCoefficients(
        1e-6
    )

    # Solve for the optimal feedback in augmented coordinates.
    Rinv = np.linalg.inv(R)
    T_val = T(z)
    f2_val = f2(z, T_val)
    dJdz = J_star.ToExpression().Jacobian(z)
    -0.5 * Rinv.dot(f2_val.T).dot(dJdz.T)

    return J_star, z

J_star, z = cartpole_sos_lower_bound(deg=4)

Note: By changing to deg=2 in the last line, I can get it to run. But it's shockingly slow. (Many minutes per iteration). When deg=4, it crashes before ever showing the clarabel console output; so I'm not sure if it's actually getting to clarabel or not.

I intend to guard this example to make it "mosek only", which is fine for now. In practice, it's pretty horrible without mosek.

Version

No response

What operating system are you using?

macOS 13 (Ventura)

What installation option are you using?

compiled from source code using Bazel

Relevant log output

when `deg=4`:

russt@TRI-FVFG13URQ6LT underactuated % python3 clarabel_crash.py  
/opt/homebrew/lib/python3.11/site-packages/scipy/__init__.py:155: UserWarning: A NumPy version >=1.18.5 and <1.26.0 is required for this version of SciPy (detected version 1.26.1
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
zsh: killed     python3 clarabel_crash.py

when deg=2 (after 10+ minutes):

russt@TRI-FVFG13URQ6LT underactuated % python3 clarabel_crash.py
/opt/homebrew/lib/python3.11/site-packages/scipy/__init__.py:155: UserWarning: A NumPy version >=1.18.5 and <1.26.0 is required for this version of SciPy (detected version 1.26.1
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
-------------------------------------------------------------
           Clarabel.rs v0.6.0  -  Clever Acronym              

                   (c) Paul Goulart                          
                University of Oxford, 2022                   
-------------------------------------------------------------

problem:
  variables     = 17619
  constraints   = 18802
  nnz(P)        = 0
  nnz(A)        = 43879
  cones (total) = 15
    :        Zero = 1,  numel = 1687
    : Nonnegative = 1,  numel = 0
    : PSDTriangle = 13,  numel = (1596,1596,1596,1596,...,231)

settings:
  linear algebra: direct / qdldl, precision: 64 bit
  max iter = 200, time limit = Inf,  max step = 0.990
  tol_feas = 1.0e-8, tol_gap_abs = 1.0e-8, tol_gap_rel = 1.0e-8,
  static reg : on, ϵ1 = 1.0e-8, ϵ2 = 4.9e-32
  dynamic reg: on, ϵ = 1.0e-13, δ = 2.0e-7
  iter refine: on, reltol = 1.0e-13, abstol = 1.0e-12,
               max iter = 10, stop ratio = 5.0
  equilibrate: on, min_scale = 1.0e-4, max_scale = 1.0e4
               max iter = 10

iter    pcost        dcost       gap       pres      dres      k/t        μ       step      
---------------------------------------------------------------------------------------------
  0  +0.0000e+00  -0.0000e+00  0.00e+00  7.42e-03  8.83e-03  1.00e+00  7.01e+01   ------

I ran the code with deg=4 on my computer, it can proceed to clarabel iterations. I got the print out below. But as you mentioned, it is very slow and consumes huge amount of RAM (> 100GB)

Specifically, this line takes about 3 minutes, and consumes about 90GB of RAM https://github.com/RobotLocomotion/drake/blob/659cf702dff93ea20ee87b18e77169be404568ef/solvers/clarabel_solver.cc#L442

And when executing this line, my RAM usage jumps to 120GB. https://github.com/RobotLocomotion/drake/blob/659cf702dff93ea20ee87b18e77169be404568ef/solvers/clarabel_solver.cc#L444

-------------------------------------------------------------
           Clarabel.rs v0.6.0  -  Clever Acronym

                   (c) Paul Goulart
                University of Oxford, 2022
-------------------------------------------------------------

problem:
  variables     = 81927
  constraints   = 84772
  nnz(P)        = 0
  nnz(A)        = 207177
  cones (total) = 15
    :        Zero = 1,  numel = 4384
    : Nonnegative = 1,  numel = 0
    : PSDTriangle = 13,  numel = (8001,8001,8001,8001,...,1596)

settings:
  linear algebra: direct / qdldl, precision: 64 bit
  max iter = 200, time limit = Inf,  max step = 0.990
  tol_feas = 1.0e-8, tol_gap_abs = 1.0e-8, tol_gap_rel = 1.0e-8,
  static reg : on, ϵ1 = 1.0e-8, ϵ2 = 4.9e-32
  dynamic reg: on, ϵ = 1.0e-13, δ = 2.0e-7
  iter refine: on, reltol = 1.0e-13, abstol = 1.0e-12,
               max iter = 10, stop ratio = 5.0
  equilibrate: on, min_scale = 1.0e-4, max_scale = 1.0e4
               max iter = 10

iter    pcost        dcost       gap       pres      dres      k/t        μ       step
---------------------------------------------------------------------------------------------

I waited for 20 minutes and Clarabel hasn't finished the first iteration yet.

I did have some success with Clarabel on mid-sized SOS problems. (The PSD matrix with < 30 rows). This problem (deg=4) has pretty large PSD matrices, here is my print out

INFO:drake:PSD cone size 126
INFO:drake:PSD cone size 126
INFO:drake:PSD cone size 126
INFO:drake:PSD cone size 126
INFO:drake:PSD cone size 126
INFO:drake:PSD cone size 21
INFO:drake:PSD cone size 273
INFO:drake:PSD cone size 21
INFO:drake:PSD cone size 21
INFO:drake:PSD cone size 21
INFO:drake:PSD cone size 21
INFO:drake:PSD cone size 21
INFO:drake:PSD cone size 56

With a psd matrix having 273 rows, it probably is too large for Clarabel.

Thanks @hongkai-dai . Can you think of any way of capturing this complexity in our ChooseBestSolver logic? Otherwise, I guess it's just a "won't fix"?

cc @goulart-paul , in case it's of interest?

I can capture the complexity in ChooseBestSolver, but I wonder what is the criteria to say that Clarabel will consume too much memory. So far as I see, it depends on both the number of PSD constraints, and the size of each PSD constraint, so it is not a single number as the criteria. Maybe the Clarabel authors have a better sense on determining what problem is too large for the solver?

The issue comes down to the dimension of the PSD cones being passed and how the per-iteration cost of an interior point solve scales w.r.t. that number.

We store only the upper triangular part of a PSD matrix, so if you have a PSD cone in $\mathbb{R}^{n\times n}$ then each cone will have $M = n(n+1)/2$ elements. This number $M$ is what is reported in the output header of the solver, e.g. where it says something like:

  cones (total) = 15
    :        Zero = 1,  numel = 4384
    : Nonnegative = 1,  numel = 0
    : PSDTriangle = 13,  numel = (8001,8001,8001,8001,...,1596)

$M = 8001$ there implies that $n = 126$ for the corresponding cone.

When we form the Hessian of the barrier function for such a cone, we get a dense block of size $M \times M$ in the KKT system that we factor at each each iteration. The actual storage for that block is again $M (M+1)/2$; for a large cone that is a lot of memory. Factoring a system with very large blocks like that will be very slow because:

we don't presently do anything to exploit sparsity on the PSD constraints. If we add a chordal decomposition method to the solver it would be a lot faster for PSD constraints that have a lot of zeros.
the method we use for solving the KKT system is based on direct LDL factorisation, which is not the best choice given the presence of really huge blocks. Some block elimination or QR-based method might be better.
the LDL solver we currently have implemented is a simplicial one, so is quite a bit slower then a supernodal method for large problems or problems with a lot of dense columns.

The first item above already exists in our COSMO solver in Julia, so we intend to port it also to Clarabel this year. We have an interface to a pure Rust supernodal LDL solver in prototype now and are just waiting for the interface to that to stabilise.

In terms of predicted memory use : it is not straightforward to compute since it is would require computation of the KKT elimination tree at least, but memory use scales roughly like $n^4$ for $n \times n$ PSD cones.

We could maybe provide some facility for predicting memory total use based on problem input, but would have to think a bit about the best way of doing this without actually directly allocating a lot of memory.

If you are able to do so, it might be helpful to see an example problem. Can you export to something like AMPL format, or some python pickle file? I would be interested to see if chordal decomposition and supernodal options are likely to help you here. I'd also like to put it through Mosek to see if we can work out what is different there.

Thanks @goulart-paul for the detailed explanation.

I think I can roughly compute ∑ᵢ nᵢ⁴ with nᵢ being the number of rows in the i'th psd constraint, and set a tolerance for this summation. If the problem size exceed this tolerance, I can either throw a warning to the user, or select an alternative solver at the moment.

It makes a lot of sense if we could speed up the computation (and save the memory) by exploiting the sparsity in the PSD constraint. I believe for our SOS problem we would have abundant sparsity. I look forward to seeing this feature ported to Clarabel!

For the problem data, I currently cannot export it to AMPL format. But I can save the problem

min c'x
s.t A*x + s = b
      s in K

I can dump the A, b, and c matrices into pickle files. and store the type of the cone K and size of each cone. Would that be OK with you? Do you have a preferred format for saving the cone K?

RobotLocomotion / drake