RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.34k stars 1.26k forks source link

Segmentation Fault in MakeFiniteHorizonLinearQuadraticRegulator #19419

Closed cohnt closed 1 year ago

cohnt commented 1 year ago

What happened?

When constructing a finite horizon LQR controller, it's possible for the program to segfault. Unfortunately, the system is very complex, so it's hard to distill it into a concise MWE. I'm very sorry about this!

I've made the code publicly available in this repository, and the segmentation fault can be replicated by checking out the branch track-down-segfault-here, and running python3 naive_trajopt.py from the src directory. Note that this requires a Drake version that supports SNOPT, and also depends on the underactuated pip package.

I think the relevant piece of code is the following, from my file src/stabliziation.py, beginning on line 128:

def finite_horizon_lqr_stabilize_to_trajectory(system_diagram, nominal_state_trajectory, nominal_control_trajectory, Q, R):
    context = system_diagram.CreateDefaultContext()
    sg = system_diagram.GetSubsystemByName("scene_graph")
    DisableCollisionChecking(sg, context)

    t0 = 0
    tf = nominal_state_trajectory.end_time()

    options = FiniteHorizonLinearQuadraticRegulatorOptions()
    options.Qf = Q.copy()
    options.x0 = options.xd = nominal_state_trajectory
    options.u0 = options.ud = nominal_control_trajectory

    return MakeFiniteHorizonLinearQuadraticRegulator(system_diagram, context, t0, tf, Q, R, options)

Note that changing the last couple of lines to

    options.x0 = nominal_state_trajectory
    options.u0 = nominal_control_trajectory

fixes the segmentation fault. So presumably, that's connected to the error? But this shouldn't be a problem, as the default behavior of FiniteHorizonLinearQuadraticRegulatorOptions is to set xd and ud to x0 and u0, respectively, if they haven't been specified.

When I run the file with gdb, I get the following output:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffc2af4754 in drake::trajectories::PiecewiseTrajectory<double>::get_segment_index(double const&) const () from /home/tommy/opt/rlg/drake-build/install/lib/python3.10/site-packages/pydrake/common/../../../../libdrake.so

Backtracing gives the following output:

#0  0x00007fffc2af4754 in drake::trajectories::PiecewiseTrajectory<double>::get_segment_index(double const&) const ()
   from /home/tommy/opt/rlg/drake-build/install/lib/python3.10/site-packages/pydrake/common/../../../../libdrake.so
#1  0x00007fffc2b118bc in drake::trajectories::PiecewisePolynomial<double>::DoEvalDerivative(double const&, int) const ()
   from /home/tommy/opt/rlg/drake-build/install/lib/python3.10/site-packages/pydrake/common/../../../../libdrake.so
#2  0x00007fffc2fd2e76 in drake::systems::controllers::FiniteHorizonLinearQuadraticRegulator(drake::systems::System<double> const&, drake::systems::Context<double> const&, double, double, Eigen::Ref<Eigen::Matrix<double, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const&, Eigen::Ref<Eigen::Matrix<double, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const&, drake::systems::controllers::FiniteHorizonLinearQuadraticRegulatorOptions const&) ()
   from /home/tommy/opt/rlg/drake-build/install/lib/python3.10/site-packages/pydrake/common/../../../../libdrake.so
#3  0x00007fffc2fd5192 in drake::systems::controllers::MakeFiniteHorizonLinearQuadraticRegulator(drake::systems::System<double> const&, drake::systems::Context<double> const&, double, double, Eigen::Ref<Eigen::Matrix<double, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const&, Eigen::Ref<Eigen::Matrix<double, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const&, drake::systems::controllers::FiniteHorizonLinearQuadraticRegulatorOptions const&) ()
   from /home/tommy/opt/rlg/drake-build/install/lib/python3.10/site-packages/pydrake/common/../../../../libdrake.so
#4  0x00007fff0a28cfa2 in ?? ()
   from /home/tommy/opt/rlg/drake-build/install/lib/python3.10/site-packages/pydrake/systems/controllers.cpython-310-x86_64-linux-gnu.so
#5  0x00007fff0a269992 in ?? ()
   from /home/tommy/opt/rlg/drake-build/install/lib/python3.10/site-packages/pydrake/systems/controllers.cpython-310-x86_64-linux-gnu.so
#6  0x00005555556b099e in cfunction_call (
    func=<built-in method MakeFiniteHorizonLinearQuadraticRegulator of PyCapsule object at remote 0x7fff0fb19020>, args=<optimized out>, 
    kwargs=<optimized out>) at ../Objects/methodobject.c:543
#7  0x00005555556a74ab in _PyObject_MakeTpCall (tstate=0x555555b5cb70, 
    callable=<built-in method MakeFiniteHorizonLinearQuadraticRegulator of PyCapsule object at remote 0x7fff0fb19020>, 
    args=<optimized out>, nargs=<optimized out>, keywords=0x0) at ../Objects/call.c:215
#8  0x000055555569f815 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x55557db2c7e8, callable=<optimized out>, 
    tstate=<optimized out>) at ../Include/cpython/abstract.h:112
#9  _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x55557db2c7e8, callable=<optimized out>, 
    tstate=<optimized out>) at ../Include/cpython/abstract.h:99
#10 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x55557db2c7e8, callable=<optimized out>)
    at ../Include/cpython/abstract.h:123
#11 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7fffffffd150, tstate=<optimized out>)
    at ../Python/ceval.c:5891
#12 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:4213
#13 0x00005555556b11ec in _PyEval_EvalFrame (throwflag=0, 
    f=Frame 0x55557db2c630, for file /home/tommy/Documents/homework/6.8210/project/repo/src/stabilization.py, line 161, in finite_horizon_lqr_stabilize_to_trajectory (system_diagram=<pydrake.systems.framework.Diagram at remote 0x7fff07de4c30>, nominal_state_trajectory=<pydrake.trajectories.PiecewisePolynomial at remote 0x7fff06e4aff0>, nominal_control_trajectory=<pydrake.trajectories.PiecewisePolynomial at remote 0x7fff06ef3e70>, Q=<numpy.ndarray at remote 0x7fff06d67cf0>, R=<numpy.ndarray at remote 0x7fff06d67690>, context=<pydrake.systems.framework.Context at remote 0x7fff077ffa30>, sg=<pydrake.geometry.SceneGraph at remote 0x7fff06d0cef0>, t0=0, tf=<float at remote 0x7fff07d90d50>, options=<pydrake.systems.controllers.FiniteHorizonLinearQuadraticRegulatorOptions at remote 0x7fff06ea7cb0>), tstate=0x555555b5cb70)
    at ../Include/internal/pycore_ceval.h:46
#14 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, args=<optimized out>, locals=0x0, con=0x7fff07e547a0, 
    tstate=0x555555b5cb70) at ../Python/ceval.c:5065
#15 _PyFunction_Vectorcall (func=<function at remote 0x7fff07e54790>, stack=<optimized out>, nargsf=<optimized out>, 
    kwnames=<optimized out>) at ../Objects/call.c:342
#16 0x0000555555699785 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff752a5a8, 
    callable=<function at remote 0x7fff07e54790>, tstate=0x555555b5cb70) at ../Include/cpython/abstract.h:114
#17 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ffff752a5a8, callable=<function at remote 0x7fff07e54790>)
    at ../Include/cpython/abstract.h:123
#18 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, trace_info=0x7fffffffd320, tstate=<optimized out>)
    at ../Python/ceval.c:5891
#19 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:4213
#20 0x0000555555695ed6 in _PyEval_EvalFrame (throwflag=0, 
    f=Frame 0x7ffff752a440, for file /home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py, line 124, in <module> (), 
    tstate=0x555555b5cb70) at ../Include/internal/pycore_ceval.h:46
#21 _PyEval_Vector (tstate=0x555555b5cb70, con=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kwnames=<optimized out>) at ../Python/ceval.c:5065
#22 0x000055555578c366 in PyEval_EvalCode (co=<code at remote 0x7ffff7449210>, 
    globals={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py') at remote 0x7ffff73ddbd0>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7ffff75905e0>, '__file__': '/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py', '__cached__': None, 'sys': <module at remote 0x7ffff757e250>, 'np': <module at remote 0x7ffff72d3420>, 'StartMeshcat': <function at remote 0x7fff1196b640>, 'Simulator': <pybind11_type(__init__=<instancemethod at remote 0x7fff0fb09570>, __doc__="A class for advancing the state of hybrid dynamic systems, represented\nby ``System<T>`` objects, forward in time. Starting with an initial\nContext for a given System, Simulator advances time and produces a\nseries of Context values that forms a trajectory satisfying the\nsystem's dynamic equations to a specified accuracy. Only the Context\nis modified by a Simulator; the Syste...(truncated), locals=<optimized out>) at ../Python/ceval.c:1134
#23 0x00005555557b9108 in run_eval_code_obj (tstate=0x555555b5cb70, co=0x7ffff7449210, 
    globals={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py') at remote 0x7ffff73ddbd0>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7ffff75905e0>, '__file__': '/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py', '__cached__': None, 'sys': <module at remote 0x7ffff757e250>, 'np': <module at remote 0x7ffff72d3420>, 'StartMeshcat': <function at remote 0x7fff1196b640>, 'Simulator': <pybind11_type(__init__=<instancemethod at remote 0x7fff0fb09570>, __doc__="A class for advancing the state of hybri--Type <RET> for more, q to quit, c to continue without paging--
d dynamic systems, represented\nby ``System<T>`` objects, forward in time. Starting with an initial\nContext for a given System, Simulator advances time and produces a\nseries of Context values that forms a trajectory satisfying the\nsystem's dynamic equations to a specified accuracy. Only the Context\nis modified by a Simulator; the Syste...(truncated), 
    locals={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py') at remote 0x7ffff73ddbd0>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7ffff75905e0>, '__file__': '/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py', '__cached__': None, 'sys': <module at remote 0x7ffff757e250>, 'np': <module at remote 0x7ffff72d3420>, 'StartMeshcat': <function at remote 0x7fff1196b640>, 'Simulator': <pybind11_type(__init__=<instancemethod at remote 0x7fff0fb09570>, __doc__="A class for advancing the state of hybrid dynamic systems, represented\nby ``System<T>`` objects, forward in time. Starting with an initial\nContext for a given System, Simulator advances time and produces a\nseries of Context values that forms a trajectory satisfying the\nsystem's dynamic equations to a specified accuracy. Only the Context\nis modified by a Simulator; the Syste...(truncated)) at ../Python/pythonrun.c:1291
#24 0x00005555557b1f5b in run_mod (mod=<optimized out>, filename=<optimized out>, 
    globals={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py') at remote 0x7ffff73ddbd0>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7ffff75905e0>, '__file__': '/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py', '__cached__': None, 'sys': <module at remote 0x7ffff757e250>, 'np': <module at remote 0x7ffff72d3420>, 'StartMeshcat': <function at remote 0x7fff1196b640>, 'Simulator': <pybind11_type(__init__=<instancemethod at remote 0x7fff0fb09570>, __doc__="A class for advancing the state of hybrid dynamic systems, represented\nby ``System<T>`` objects, forward in time. Starting with an initial\nContext for a given System, Simulator advances time and produces a\nseries of Context values that forms a trajectory satisfying the\nsystem's dynamic equations to a specified accuracy. Only the Context\nis modified by a Simulator; the Syste...(truncated), 
    locals={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py') at remote 0x7ffff73ddbd0>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7ffff75905e0>, '__file__': '/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py', '__cached__': None, 'sys': <module at remote 0x7ffff757e250>, 'np': <module at remote 0x7ffff72d3420>, 'StartMeshcat': <function at remote 0x7fff1196b640>, 'Simulator': <pybind11_type(__init__=<instancemethod at remote 0x7fff0fb09570>, __doc__="A class for advancing the state of hybrid dynamic systems, represented\nby ``System<T>`` objects, forward in time. Starting with an initial\nContext for a given System, Simulator advances time and produces a\nseries of Context values that forms a trajectory satisfying the\nsystem's dynamic equations to a specified accuracy. Only the Context\nis modified by a Simulator; the Syste...(truncated), flags=<optimized out>, arena=<optimized out>)
    at ../Python/pythonrun.c:1312
#25 0x00005555557b8e55 in pyrun_file (fp=fp@entry=0x555555b5ef40, 
    filename=filename@entry='/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py', start=start@entry=257, 
    globals=globals@entry={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py') at remote 0x7ffff73ddbd0>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7ffff75905e0>, '__file__': '/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py', '__cached__': None, 'sys': <module at remote 0x7ffff757e250>, 'np': <module at remote 0x7ffff72d3420>, 'StartMeshcat': <function at remote 0x7fff1196b640>, 'Simulator': <pybind11_type(__init__=<instancemethod at remote 0x7fff0fb09570>, __doc__="A class for advancing the state of hybrid dynamic systems, represented\nby ``System<T>`` objects, forward in time. Starting with an initial\nContext for a given System, Simulator advances time and produces a\nseries of Context values that forms a trajectory satisfying the\nsystem's dynamic equations to a specified accuracy. Only the Context\nis modified by a Simulator; the Syste...(truncated), 
    locals=locals@entry={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py') at remote 0x7ffff73ddbd0>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7ffff75905e0>, '__file__': '/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py', '__cached__': None, 'sys': <module at remote 0x7ffff757e250>, 'np': <module at remote 0x7ffff72d3420>, 'StartMeshcat': <function at remote 0x7fff1196b640>, 'Simulator': <pybind11_type(__init__=<instancemethod at remote 0x7fff0fb09570>, __doc__="A class for advancing the state of hybrid dynamic systems, represented\nby ``System<T>`` objects, forward in time. Starting with an initial\nContext for a given System, Simulator advances time and produces a\nseries of Context values that forms a trajectory satisfying the\nsystem's dynamic equations to a specified accuracy. Only the Context\nis modified by a Simulator; the Syste...(truncated), closeit=closeit@entry=1, flags=0x7fffffffd5c8)
    at ../Python/pythonrun.c:1208
#26 0x00005555557b8338 in _PyRun_SimpleFileObject (fp=0x555555b5ef40, 
    filename='/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py', closeit=1, flags=0x7fffffffd5c8)
    at ../Python/pythonrun.c:456
#27 0x00005555557b8033 in _PyRun_AnyFileObject (fp=0x555555b5ef40, 
    filename='/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py', closeit=1, flags=0x7fffffffd5c8)
    at ../Python/pythonrun.c:90
#28 0x00005555557a92de in pymain_run_file_obj (skip_source_first_line=0, 
    filename='/home/tommy/Documents/homework/6.8210/project/repo/src/naive_trajopt.py', 
    program_name='/home/tommy/Documents/homework/6.8210/env_underactuated/bin/python') at ../Modules/main.c:353
#29 pymain_run_file (config=0x555555b409b0) at ../Modules/main.c:372
#30 pymain_run_python (exitcode=0x7fffffffd5c4) at ../Modules/main.c:587
#31 Py_RunMain () at ../Modules/main.c:666
#32 0x000055555577f32d in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at ../Modules/main.c:720
#33 0x00007ffff7c29d90 in __libc_start_call_main (main=main@entry=0x55555577f2f0 <main>, argc=argc@entry=2, 
    argv=argv@entry=0x7fffffffd7d8) at ../sysdeps/nptl/libc_start_call_main.h:58
#34 0x00007ffff7c29e40 in __libc_start_main_impl (main=0x55555577f2f0 <main>, argc=2, argv=0x7fffffffd7d8, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd7c8) at ../csu/libc-start.c:392
#35 0x000055555577f225 in _start ()

Version

No response

What operating system are you using?

No response

What installation option are you using?

No response

Relevant log output

No response

jwnimmer-tri commented 1 year ago

Thanks for this, I'll try to see what I can find.

What OS were you running on when you experienced the fault? It looks like Python 3.10, so I'm guessing Ubuntu 22?

cohnt commented 1 year ago

Yes, Python 3.10 with Ubuntu 22.04 (technically Tuxedo OS, but they're basically the same thing). Thanks for taking a look, and let me know if there's any other information you need!

hongkai-dai commented 1 year ago

@jwnimmer-tri have you made progress on this issue?

jwnimmer-tri commented 1 year ago

No, I haven't had time to circle back yet.

cohnt commented 1 year ago

I was able to distill this into a simpler example, by modifying some of Russ' code in underactuated. You can find my modifications here -- all I've done is change

    options = FiniteHorizonLinearQuadraticRegulatorOptions()
    options.x0 = x_trajectory
    options.u0 = u_trajectory

to

    options = FiniteHorizonLinearQuadraticRegulatorOptions()
    options.x0 = options.xd = x_trajectory
    options.u0 = options.ud = u_trajectory

So the issue is definitely occurring when manually setting the desired and nominal trajectories, but I don't have an intuition as to why.

hongkai-dai commented 1 year ago

I will look into this later.

hongkai-dai commented 1 year ago

@cohnt does the segfault occur for pendulum or the cart-pole? I tried to set options.xd and options.ud for some simple example in C++ and the code runs fine. I think now I need to exactly re-write the pendulum/cart-pole example in C++ to reproduce this seg fault.

cohnt commented 1 year ago

It occurs for the acrobot, in the "Trajectory stabilization with (finite-horizon) LQR" block.