Open EricCousineau-TRI opened 3 years ago
See example trying to get a stack trace in #14380
Commit: b99ff10d6b Command:
cd drake
bazel run -c dbg //:install -- ${PWD}/build
python_path=$(ls -d ~+/build/lib/python*/site-packages)
env -i PYTHONPATH=${python_path} /usr/bin/python3 -c 'import pydrake.common._module_py._testing as m; m.trigger_segfault()'
Triggers a segfault. coredumpctl gdb
doesn't seem to turn up anything useful (says there's no stacktrace).
Running it under GDB also doesn't seem to yield useful symbols:
$ env -i PYTHONPATH=${python_path} gdb -ex run -ex backtrace --args /usr/bin/python3 -c 'import pydrake.common._module_py._testing as m; m.trigger_segfault()'
...
0x00007ffff21c09d7 in ?? () from ${PWD}/build/lib/python3.6/site-packages/pydrake/common/_module_py.so
#0 0x00007ffff21c09d7 in ?? () from ${PWD}/build/lib/python3.6/site-packages/pydrake/common/_module_py.so
#1 0x00007ffff21c662c in ?? () from ${PWD}/build/lib/python3.6/site-packages/pydrake/common/_module_py.so
#2 0x00007ffff21c5705 in ?? () from ${PWD}/build/lib/python3.6/site-packages/pydrake/common/_module_py.so
#3 0x00007ffff21c4683 in ?? () from ${PWD}/build/lib/python3.6/site-packages/pydrake/common/_module_py.so
#4 0x00007ffff21c46e0 in ?? () from ${PWD}/build/lib/python3.6/site-packages/pydrake/common/_module_py.so
#5 0x00007ffff21d4b0e in ?? () from ${PWD}/build/lib/python3.6/site-packages/pydrake/common/_module_py.so
#6 0x000000000050a4a5 in _PyCFunction_FastCallDict (kwargs=<optimized out>, nargs=<optimized out>, args=<optimized out>,
func_obj=<built-in method trigger_segfault of PyCapsule object at remote 0x7ffff242d4e0>) at ../Objects/methodobject.c:231
#7 _PyCFunction_FastCallKeywords (kwnames=<optimized out>, nargs=<optimized out>, stack=<optimized out>, func=<optimized out>) at ../Objects/methodobject.c:294
#8 call_function.lto_priv () at ../Python/ceval.c:4851
...
Using python3-dbg
doesn't seem to turn up anything more fruitful in terms of Drake.
(Python, however, shows it all)
Naively trying out CMake per our docs:
cd drake
mkdir build_cmake && cd build_cmake
cmake .. -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=${PWD}/install
EDIT: Success! That seems to do it:
$ python_path=$(ls -d ~+/build_cmake/install/lib/python*/site-packages)
$ env -i PYTHONPATH=${python_path} gdb -ex run -ex backtrace --args /usr/bin/python3 -c 'import pydrake.common._module_py._testing as m; m.trigger_segfault()'
...
0x00007ffff21c09d7 in drake::pydrake::(anonymous namespace)::testing::<lambda()>::operator()(void) const (__closure=0x10775b8) at bindings/pydrake/common/module_py.cc:91
91 *value = 0xbadf00d;
#0 0x00007ffff21c09d7 in drake::pydrake::(anonymous namespace)::testing::<lambda()>::operator()(void) const (__closure=0x10775b8) at bindings/pydrake/common/module_py.cc:91
#1 0x00007ffff21c662c in pybind11::detail::argument_loader<>::call_impl<void, drake::pydrake::(anonymous namespace)::testing::def_testing(pybind11::module)::<lambda()>&, pybind11::detail::void_type>(drake::pydrake::(anonymous namespace)::testing::<lambda()> &, std::index_sequence, pybind11::detail::void_type &&) (this=0x7fffffffe416, f=...)
at external/pybind11/include/pybind11/cast.h:2302
#2 0x00007ffff21c5705 in pybind11::detail::argument_loader<>::call<void, pybind11::detail::void_type, drake::pydrake::(anonymous namespace)::testing::def_testing(pybind11::module)::<lambda()>&>(drake::pydrake::(anonymous namespace)::testing::<lambda()> &) (this=0x7fffffffe416, f=...) at external/pybind11/include/pybind11/cast.h:2279
#3 0x00007ffff21c4683 in pybind11::cpp_function::<lambda(pybind11::detail::function_call&)>::operator()(pybind11::detail::function_call &) const (__closure=0x0, call=...)
at external/pybind11/include/pybind11/pybind11.h:180
#4 0x00007ffff21c46e0 in pybind11::cpp_function::<lambda(pybind11::detail::function_call&)>::_FUN(pybind11::detail::function_call &) ()
at external/pybind11/include/pybind11/pybind11.h:158
#5 0x00007ffff21d4b0e in pybind11::cpp_function::dispatcher (self=<PyCapsule at remote 0x7ffff242d4e0>, args_in=(), kwargs_in=0x0) at external/pybind11/include/pybind11/pybind11.h:654
#6 0x000000000050a4a5 in _PyCFunction_FastCallDict (kwargs=<optimized out>, nargs=<optimized out>, args=<optimized out>,
func_obj=<built-in method trigger_segfault of PyCapsule object at remote 0x7ffff242d4e0>) at ../Objects/methodobject.c:231
#7 _PyCFunction_FastCallKeywords (kwnames=<optimized out>, nargs=<optimized out>, stack=<optimized out>, func=<optimized out>) at ../Objects/methodobject.c:294
#8 call_function.lto_priv () at ../Python/ceval.c:4851
From my comment in Slack:
The killer combo with debugging is:
- Use
@traced
andreexecute_if_unbuffered
- Use gdb backtrace
- Also tell Drake to set it's spdlog level: https://drake.mit.edu/pydrake/pydrake.common.html#pydrake.common.set_log_level
from pydrake.common import set_log_level set_log_level("trace")
Then it should be obvious how things playout
I (or someone else ;) should PR that to Drake docs.
Efff... I was the one who sprinkled that sauce in :facepalm: https://github.com/RobotLocomotion/drake/commit/ea1b198a7cf52e737dde20b43ae95bf7d9ed4844 (#10103)
TL;DR:
Use bazel run //:install -- --no_strip
for devs.
@EricCousineau-TRI Is there any way to intercept the Bazel strip option which seems a lot more intuitive?
Yup, good idea! Perhaps there's a way to extract that from cc toolchain information. Unfortunately, though, it's unlikely I will have time to focus on it anytime soon.
Just to check, is this something that impacted your workflow?
I lost a day trying to build libdrake with debug symbols before seeing this --no_strip
option on the installer 😥.
I can file an improvement ticket?
Ah, sorry to hear that, and yup, that'd be best!
From my comment in Slack:
The killer combo with debugging is:
- Use
@traced
andreexecute_if_unbuffered
- Use gdb backtrace
- Also tell Drake to set it's spdlog level: https://drake.mit.edu/pydrake/pydrake.common.html#pydrake.common.set_log_level
from pydrake.common import set_log_level set_log_level("trace")
Then it should be obvious how things playout
This doesn't work anymore. I get "No stack" in gdb.
I did the following:
bazel run -j 10 --config=mosek --config=gurobi //:install -- ~/drake-build --no_strip
gdb -ex run -ex backtrace --args python3 gcs_test6_fast4.py <my script args>
Is it because set_log_level is no longer present in current drake? What is the equivalent of set_log_level("trace") now?
@sloretz is trying to debug some stuff here: https://github.com/sloretz/drake_ros2_demos/pull/3
I thought #14356 would've fixed it, but possibly not.
He's trying to debug, but can't easily find the point in Drake C++ / pybind11 code where things go awry.
For now, he's trying out the
@traced
macro from here: https://drake.mit.edu/python_bindings.html#debugging-with-the-python-bindingsI am assuming the our
cmake -DCMAKE_BUILD_TYPE=Debug
command isn't doing anything more special than-c dbg
for Bazel; but we shall see...