Do not use PSD or LMI constraint for small matrices

hongkai-dai commented 4 months ago

When we impose the constraint X is psd, we should re-consider the formulation if X is small size

If X is a 1 x 1 matrix (a scalar), then it is better to impose a linear constraint.
If X is a 2 x 2 matrix, then X is psd is equivalent to [X(0, 0), X(1, 1), X(2, 2)] is in the rotated Lorentz cone. I think the solver would perform better (at least not worse) with a rotated Lorentz cone rather than a PSD constraint. (The caveat is that CSDP doesn't support rotated Lorentz cone natively, so we need to convert the rotated Lorentz cone to PSD in CSDP, so the performance of reformulating a rotated Lorentz cone is the same as using PSD with CSDP).

So I think we shouldn't impose X is psd when the size of X is either 1 x 1 or 2 x 2. The question is, where should we change the code? Here are a few options

In the constructor of PositiveSemidefiniteConstraint and LinearMatrixInequalityConstraint, throws an error if the size of the matrix is 1 x 1 or 2 x 2.
When calling MathematicalProgram::AddPositiveSemidefiniteConstraint() or MathematicalProgram::AddLinearMatrixInequalityConstraint(), depending on the size of the matrix, we might add a linear inequality constraint (1 x 1 matrix), a rotated Lorentz cone constraint (2 x 2 matrix), or an PSD/LMI constraint (larger matrix).
In each of the solver backend (MosekSolver, ClarabelSolver, etc), when it parses the PositiveSemidefiniteConstraint (or LinearMatrixInequalityConstraint), first check the size of the matrix; if the matrix size is 1 x 1 or 2 x 2, then tell the solver to add a linear inequality constraint or rotated Lorentz cone constraint.

Each option has its pros and cons:

Option 1 means that the user needs to modify their code if they constructed these constraints with small-size matrices.
Option 2 means that if the user writes Binding<PositiveSemidefiniteConstraint> con = prog.AddPositiveSemidefiniteConstraint(...), then the return argument might be invalid (but it doesn't affect python code, or if the user uses auto keyword for the return type).
Option 3 doesn't require the user to change the code, but it is a lot of work for us (we need to change each solver backend code).

I personally prefer option 2. @RussTedrake @AlexandreAmice WDYT?

jwnimmer-tri commented 4 months ago

... but it is a lot of work for us (we need to change each solver backend code).

Is it actually a lot of work?

Imagine that we had a helper function that converts to a preferred form:

// When some constraint can be converted to a simpler formulation, removes it from
// its original list of bindings and adds the equivalent to the new list of bindings.
// @param[in,out] positive_semidefinite_constraints
// @param[in,out] linear_matrix_inequality_constraints
// @param[in,out] linear_equality_constraints
// ...
void RespellTrivialPsdLmiToPreferredForms(
  std::vector<Binding<PositiveSemidefiniteConstraint>>* positive_semidefinite_constraints,
  std::vector<Binding<LinearMatrixInequalityConstraint>>* linear_matrix_inequality_constraints,
  std::vector<Binding<LinearEqualityConstraint>>* linear_equality_constraints,
  ...
);

Now each solver back-end can just call that function as a pre-processing step, before it starts compiling the program into the backend-specific data structures. We only need to implement the pre-processing logic once, for all solvers.

hongkai-dai commented 4 months ago

Thanks Jeremy!

I shouldn't say option 3 is a lot of work. "Some extra work" would be a more appropriate description, including

Now when we call MathematicalProgram::AddPositiveSemidefiniteConstraint(), we don't necessarily activate ProgramAttribute::kPositiveSemidefiniteConstraint. Instead we might activate ProgramAttribute::kLinearConstraint or ProgramAttribute::kLorentzConeConstraint, but the prog.linear_constraints() and program.rotated_lorentz_cone_constraint() might be empty. This could be confusing to the users.
Inside each solver, we will also compute the dual solution for each constraint. Now after calling RespellTrivialPsdLmiToPreferredForm, we have split prog.positive_semidefinite_constraints() into the four vectors positive_semidefinite_constraints, linear_matrix_inequality_constraints, linear_equality_constraints and rotated_lorentz_cone_constraints. We need to figure out the mapping from the original prog.positive_semidefinite_constraints() to these four vector of constraints, and also figure out how to map their dual variables. This is not trivial work.
Even after calling RespellTrivialPsdLmiToPreferredForm, we still need to change the current code inside each solver backend to parse the new constraints (many of our solvers don't take a vector of constraints as the input, for example in https://github.com/RobotLocomotion/drake/blob/af0551e35fbe97bcb9f0e983945002ecbe4e15f0/solvers/aggregate_costs_constraints.h#L188-L193 )

All these are doable, but I don't think it is small amount of work.

On the other hand, I find option 2 conceptually easier, that when we call MathematicalProgram::AddPositiveSemidefiniteConstraint(), we add it to prog.linear_constraints(), prog.rotated_lorentz_cone_constraint() or prog.positive_semidefinite_constraints() depending on the size of the matrix. So when we look at a MathematicalProgram object with non-empty positive_semidefinite_constraints(), it really means that we need a PSD solver (instead of a linear or lorentz cone solver). The caveat is that the return type of MathematicalProgram::AddPositiveSemidefiniteConstraint() will be

std::variant<Binding<PositiveSemidefiniteConstraint>, Binding<RotatedLorentzConeConstraint>, Binding<LinearConstraint>> MathematicalProgram::AddPositiveSemidefiniteConstraint(...)

so it might break the user's code.

jwnimmer-tri commented 4 months ago

Thanks, I agree with that summary. That's a better outline of the choices.

AlexandreAmice commented 4 months ago

I think option 2 is certainly the easiest way to get this change implemented. However, I think the option where we silently rewrite the constraint for the user in a better form is conceptually much more appealing.

When it comes to optimization I am generally in favor of making the interface as conceptually easy for the user while the parser does these types of optimization which are clearly always the better choice.

RobotLocomotion / drake

Do not use PSD or LMI constraint for small matrices #21663