RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.27k stars 1.26k forks source link

ignition: Should remove all static storage duration (unless trivially destructible) #15601

Closed EricCousineau-TRI closed 2 years ago

EricCousineau-TRI commented 3 years ago

@SeanCurtis-TRI ran into a nasty segfault in pydrake that is most likely due to static storage duration for objects w/ non-trivial destructors: https://gist.github.com/SeanCurtis-TRI/8122cb2ad2b78f99f91ab35d19ff931c Slack thread: https://drakedevelopers.slack.com/archives/C3YB3EV5W/p1628696745011100

To observe the phenomenon, the following branch will manifest the problem: https://github.com/SeanCurtis-TRI/drake/tree/TEMP_showcase_angry_ignition Simply execute bazel test //bindings/pydrake:py/math_test to see the failure.

Suggested steps:

\cc @jwnimmer-tri

jwnimmer-tri commented 3 years ago

Suggested steps:

  • Patch how Drake consumes ignition to not use any non-trivially destructible (or constructible) objects with static storage duration
  • Upstream those patches (hopefully only within the ignition ecosystem)

I'd advocate for a slightly different approach. For anything that is actively hurting us (with segfaults, or etc.), then we can have a Drake-local patch file that is upstreamed later. But for all of the other global variable coding-style violations that are not hurting us, I don't see why we'd make Drake-specific patches for those? It seems like PRing those directly to upstream and waiting for a release should be sufficient.

I see that ignition & sdformat have adopted the Google Style Guide, which means that statics like this are forbidden by rule. So the other thing to work on is to help remind the code reviewers for those projects of that rule, so that future PRs don't introduce regressions on this front.

EricCousineau-TRI commented 3 years ago

Sounds great! Updated to that effect.

azeey commented 3 years ago

When I ran the command, I actually get a stack trace that shows the issue is in libyaml-cpp, but I went ahead removed all non-trivially destructible objects from ignition-math anyway. However, that didn't solve the problem. I found that libyaml-cpp also uses a global std::string array, which can easily be converted to an array of const char *, so I did that, but that didn't work either. Digging a little deeper, I found out that before the segfault happens, python fails to import a pydrake module here: https://github.com/RobotLocomotion/drake/blob/e862011fa0959aa4305e2c0c24b590a69e1a5907/common/test_utilities/drake_py_unittest_main.py#L18

This fails because of a missing symbol drake::geometry::SceneGraph<drake::symbolic::Expression>::model_inspector() const. My guess is the segfault also occurs because of the missing symbol when loading the test module via SourceFileLoader.load_module():

https://github.com/RobotLocomotion/drake/blob/e862011fa0959aa4305e2c0c24b590a69e1a5907/common/test_utilities/drake_py_unittest_main.py#L67

I believe the issue is that the code in @SeanCurtis-TRI's branch is instantiating drake::multibody::ContactResultsToLcmSystem with the DRAKE_DEFINE_CLASS_TEMPLATE_INSTANTIATIONS_ON_DEFAULT_SCALARS macro which instantiates drake::multibody::ContactResultsToLcmSystem<drake::symbolic::Expression>, which then tries to instantiate drake::geometry::SceneGraph<drake::symbolic::Expression> and fails. This diff fixes the segfault for me:

@@ -293,8 +293,8 @@ systems::lcm::LcmPublisherSystem* ConnectContactResultsToDrakeVisualizer(
       builder, multibody_plant, nullptr, contact_results_port, lcm);
 }

+template class ContactResultsToLcmSystem<double>;
+template class ContactResultsToLcmSystem<AutoDiffXd>;
+
 }  // namespace multibody
 }  // namespace drake
-
-DRAKE_DEFINE_CLASS_TEMPLATE_INSTANTIATIONS_ON_DEFAULT_SCALARS(
-    class drake::multibody::ContactResultsToLcmSystem)
jwnimmer-tri commented 3 years ago

Good find! That sounds very plausible.

SeanCurtis-TRI commented 3 years ago

That's not just plausible, that's incredibly compelling. Thanks for the sleuthing, @azeey! It makes perfect sense. I'll have to consider the proper Drake-side resolution.

EricCousineau-TRI commented 3 years ago

Nice indeed! Given discussion, lowering priority, and updating checkboxes to focus solely on upstream. That being said, it would still be nice to remove non-trivially destructible globals, at least from ignition.

At that point, the symptoms (via stack trace) might have been slightly less confusing.

jwnimmer-tri commented 2 years ago

I would like to close this issue, in lieu of an upstream ticket in ignition.

@azeey have you already fixed all of the instances of static initialization and destruction in ignition? If yes, let's close this Drake issue. If not, could I ask you open open an ignition issue to that effect, backlink it here, and then close this Drake issue?

azeey commented 2 years ago

No, I haven't yet. I'll create an issue in ign-math. There's already one for sdformat: https://github.com/ignitionrobotics/sdformat/issues/552

jwnimmer-tri commented 2 years ago

If I'm reading it correctly, an issue already exists -- https://github.com/ignitionrobotics/ign-math/issues/269.