Segmentation Fault in ContinuousAlgebraicRiccatiEquation

cohnt commented 1 year ago

What happened?

The following code produces a segmentation fault in ContinuousAlgebraicRiccatiEquation. I encountered this problem when trying to construct an LQR controller of a quadrotor, leashed to the origin by a spring. I think the problem has to do with the low rank of Q, since upping the number of nonzero diagonal entries past 3 leads to the system functioning fine.

import numpy as np
from pydrake.all import ContinuousAlgebraicRiccatiEquation

count = 3

A = np.array([
    [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [-12, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, -9, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
])
B = np.array([
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [1, 1, 1, 1],
    [7, -7, 7, -7],
    [-59, 0, 59, 0],
    [0, 99, 0, -99]
])
Q = np.diag([10] * count + [0] * (12 - count))
R = np.eye(4)

ContinuousAlgebraicRiccatiEquation(A, B, Q, R)

I've run the code through GDB, and it reports an error in Eigen::SVDBase<Eigen::JacobiSVD. I've included that log output below.

Version

No response

What operating system are you using?

Ubuntu 22.04

What installation option are you using?

compiled from source code using CMake

Relevant log output

(gdb) run test.py 
Starting program: /usr/bin/python test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff45ff640 (LWP 68367)]
[New Thread 0x7ffff3dfe640 (LWP 68368)]
[New Thread 0x7fffeb5fd640 (LWP 68369)]
[New Thread 0x7fffe2dfc640 (LWP 68370)]
[New Thread 0x7fffda5fb640 (LWP 68371)]
[New Thread 0x7fffd9dfa640 (LWP 68372)]
[New Thread 0x7fffc15f9640 (LWP 68373)]
[New Thread 0x7fffc0df8640 (LWP 68374)]
[New Thread 0x7fffb85f7640 (LWP 68375)]
[New Thread 0x7fffafdf6640 (LWP 68376)]
[New Thread 0x7fff9f5f5640 (LWP 68377)]
[New Thread 0x7fff9edf4640 (LWP 68378)]
[New Thread 0x7fff965f3640 (LWP 68379)]
[New Thread 0x7fff8ddf2640 (LWP 68380)]
[New Thread 0x7fff855f1640 (LWP 68381)]
[New Thread 0x7fff7cdf0640 (LWP 68382)]
[New Thread 0x7fff6c5ef640 (LWP 68383)]
[New Thread 0x7fff63dee640 (LWP 68384)]
[New Thread 0x7fff5b5ed640 (LWP 68385)]

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fff4f424e22 in void Eigen::SVDBase<Eigen::JacobiSVD<Eigen::Matrix<double, -1, -1, 0, -1, -1>, 2> >::_solve_impl<Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Matrix<double, -1, -1, 0, -1, -1> >(Eigen::Matrix<double, -1, -1, 0, -1, -1> const&, Eigen::Matrix<double, -1, -1, 0, -1, -1>&) const () from /home/tommy/opt/rlg/drake-build/install/lib/python3.10/site-packages/pydrake/common/../../../../libdrake.so

sherm1 commented 1 year ago

Not sure if you're the right person for this, @hongkai-dai. If not please reassign.

cohnt commented 1 year ago

Tagging @sageshoyu, since this affects him as well.

hongkai-dai commented 1 year ago

Sorry for the late reply.

I think your system is not stabilizable, hence it cannot find an LQR controller, and Riccati equation won't have a solution.

A system ẋ = Ax+Bu is stabilizable if either

The pair (A, B) is controllable, namely the controllability matrix [B AB ... Aⁿ⁻¹B] has full row rank. In you example this controllability matrix only has rank 4, so it is not controllable.
Re(λ(A)) < 0. In your case the real part of all eigen values of A is 0.

Since the system is not stabilizable, we can't synthesize an LQR controller (See this for a more detailed discussion)

Maybe it would help you if we throw a more meaningful error message instead of just seg fault?

jwnimmer-tri commented 1 year ago

Maybe it would help you if we throw a more meaningful error message instead of just seg fault?

Drake should never segfault. At worst, we should DRAKE_DEMAND to abort with a message, but more likely a nice std::logic_error would be the best choice here.

cohnt commented 1 year ago

I think there's a couple of factors at play here. There's a question of whether the system is stabilizable, there's the question of why modifying Q affects the behavior of the CARE solver, and there's the segfault itself.

Regarding the stabilizability of the system, this originally came from a nonlinear system, where a quadrotor is leashed to an anchor point with a hookian spring. The A and B matrices are obtained by linearizing about a point where the quadrotor is horizontal, and the spring is at its equilibrium state. I can provide the code that was used to obtain these A and B matrices if it would be helpful.

Hongkai, I believe the conditions you listed are sufficient conditions, but not necessary conditions. This lecture provides necessary and sufficient conditions. In particular, we can verify the system is stabilizable by condition (3) of slide 3:

import numpy as np
A = np.array([
    [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
    [-12, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, -9, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
])
B = np.array([
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [0, 0, 0, 0],
    [1, 1, 1, 1],
    [7, -7, 7, -7],
    [-59, 0, 59, 0],
    [0, 99, 0, -99]
])
evals, evecs = np.linalg.eig(A)
for l in evals:
    if np.real(l) >= 0:
        mat = np.hstack((l*np.eye(A.shape[0])-A, B))
        print(np.linalg.matrix_rank(mat))

Because each eigenvalue λ with positive real part admits a matrix [λI-A B] with full row rank, the system is stabilizable.

Regarding the modification of Q affecting the behavior of the CARE solver, I would expect that removing a cost from the LQR cost-to-go wouldn't take the problem from feasible to infeasible. If I set Q to a matrix of all zeros, then a controller with all zero signal should be optimal, but I get an error. I confirmed that matlab exhibits the exact same behavior -- 3 (or fewer) nonzero entries on the diagonal errors, and 4 works fine.

Regarding the segmentation fault, I would be in favor of a check that determines whether or not the A, B, Q, and R matrices are reasonable, and potentially an error if the CARE solver fails. But I think that this is actually a bug somewhere inside the CARE solver, especially given that there's a segmentation fault.

hongkai-dai commented 1 year ago

Sorry I made a mistake in my previous comment:

Your (A, B) is stabilizable, but (Q, A) is not observable, hence we cannot synthesize an LQR controller. Check page 23 of the slides. The observability matrix has rank 10, not 12 in this case

Gram = np.concatenate([Q @ np.linalg.matrix_power(A, i) for i in range(12)], axis=0)
print(np.linalg.matrix_rank(Gram)

If you change count=4, then (Q, A) is observable.

cohnt commented 1 year ago

Observability of (Q, A) is a sufficient, but not a necessary condition. As best I can tell, the necessary condition which this system fails is detectability of (Q, A) -- equivalently, stability of (A',Q'):

AT, QT = A.T, Q.T
evals, evecs = np.linalg.eig(AT)
for l in evals:
    if np.real(l) >= 0:
        mat = np.hstack((l*np.eye(AT.shape[0])-AT, QT))
        print(np.linalg.matrix_rank(mat))

We can see from this code that (A',Q') is not stabilizable when count=3, but is stabilizable when count 4. Assuming I'm understanding the relevant control theory correctly, this serves as a proof that there is no LQR solution, and we should expect some sort of error?

The only other question I have is whether these conditions are sufficient for the LQR solver to succeed. It wouldn't be helpful to focus on these conditions if the solver requires something stronger. I don't know of any examples of systems that are detectable but not observable.

Besides handling the segmentation fault (and providing a warning), it would be helpful to update the Drake documentation of LQR to specify these necessary and sufficient conditions. Right now, it just says that Q>=0 and R>0, which is what lead me to encounter this bug.

cohnt commented 1 year ago

@hongkai-dai I've inadvertently come up with another linear system that causes the CARE equation to segfault. It's a rather larg system, but I've included it in the following gist: https://gist.github.com/cohnt/8cdd84ec804bc9f3fb4d4fb1eed72e9e

As you can see from the code, the system is not stabilizable, which is probably why the solver fails. But I think the new test should be detecting it? I build it on commit hash 8ddd6672b1, which is after the merge of #19276

hongkai-dai commented 1 year ago

The fix we had in #19276 detects a sufficient condition for stabilizable and detectable. It isn't an necessary condition. I will work on a separate PR to add the sufficient and necessary condition.

cohnt commented 1 year ago

The fix we had in #19276 detects a sufficient condition for stabilizable and detectable. It isn't an necessary condition. I will work on a separate PR to add the sufficient and necessary condition.

Awesome! And I'll try and get this example down to something a little simpler, so it makes more sense as a test case.

UPDATE: I got it down to 25x25. You can find it here.

RobotLocomotion / drake