Closed EricCousineau-TRI closed 2 years ago
Suggested steps:
- Patch how Drake consumes ignition to not use any non-trivially destructible (or constructible) objects with static storage duration
- Upstream those patches (hopefully only within the ignition ecosystem)
I'd advocate for a slightly different approach. For anything that is actively hurting us (with segfaults, or etc.), then we can have a Drake-local patch file that is upstreamed later. But for all of the other global variable coding-style violations that are not hurting us, I don't see why we'd make Drake-specific patches for those? It seems like PRing those directly to upstream and waiting for a release should be sufficient.
I see that ignition & sdformat have adopted the Google Style Guide, which means that statics like this are forbidden by rule. So the other thing to work on is to help remind the code reviewers for those projects of that rule, so that future PRs don't introduce regressions on this front.
Sounds great! Updated to that effect.
When I ran the command, I actually get a stack trace that shows the issue is in libyaml-cpp
, but I went ahead removed all non-trivially destructible objects from ignition-math anyway. However, that didn't solve the problem. I found that libyaml-cpp
also uses a global std::string
array, which can easily be converted to an array of const char *
, so I did that, but that didn't work either. Digging a little deeper, I found out that before the segfault happens, python fails to import a pydrake module here:
https://github.com/RobotLocomotion/drake/blob/e862011fa0959aa4305e2c0c24b590a69e1a5907/common/test_utilities/drake_py_unittest_main.py#L18
This fails because of a missing symbol drake::geometry::SceneGraph<drake::symbolic::Expression>::model_inspector() const
. My guess is the segfault also occurs because of the missing symbol when loading the test module via SourceFileLoader.load_module()
:
I believe the issue is that the code in @SeanCurtis-TRI's branch is instantiating drake::multibody::ContactResultsToLcmSystem
with the DRAKE_DEFINE_CLASS_TEMPLATE_INSTANTIATIONS_ON_DEFAULT_SCALARS
macro which instantiates drake::multibody::ContactResultsToLcmSystem<drake::symbolic::Expression>
, which then tries to instantiate drake::geometry::SceneGraph<drake::symbolic::Expression>
and fails. This diff fixes the segfault for me:
@@ -293,8 +293,8 @@ systems::lcm::LcmPublisherSystem* ConnectContactResultsToDrakeVisualizer(
builder, multibody_plant, nullptr, contact_results_port, lcm);
}
+template class ContactResultsToLcmSystem<double>;
+template class ContactResultsToLcmSystem<AutoDiffXd>;
+
} // namespace multibody
} // namespace drake
-
-DRAKE_DEFINE_CLASS_TEMPLATE_INSTANTIATIONS_ON_DEFAULT_SCALARS(
- class drake::multibody::ContactResultsToLcmSystem)
Good find! That sounds very plausible.
That's not just plausible, that's incredibly compelling. Thanks for the sleuthing, @azeey! It makes perfect sense. I'll have to consider the proper Drake-side resolution.
Nice indeed! Given discussion, lowering priority, and updating checkboxes to focus solely on upstream.
That being said, it would still be nice to remove non-trivially destructible globals, at least from ignition
.
At that point, the symptoms (via stack trace) might have been slightly less confusing.
I would like to close this issue, in lieu of an upstream ticket in ignition
.
@azeey have you already fixed all of the instances of static initialization and destruction in ignition
? If yes, let's close this Drake issue. If not, could I ask you open open an ignition
issue to that effect, backlink it here, and then close this Drake issue?
No, I haven't yet. I'll create an issue in ign-math. There's already one for sdformat: https://github.com/ignitionrobotics/sdformat/issues/552
If I'm reading it correctly, an issue already exists -- https://github.com/ignitionrobotics/ign-math/issues/269.
@SeanCurtis-TRI ran into a nasty segfault in pydrake that is most likely due to static storage duration for objects w/ non-trivial destructors: https://gist.github.com/SeanCurtis-TRI/8122cb2ad2b78f99f91ab35d19ff931c Slack thread: https://drakedevelopers.slack.com/archives/C3YB3EV5W/p1628696745011100
To observe the phenomenon, the following branch will manifest the problem: https://github.com/SeanCurtis-TRI/drake/tree/TEMP_showcase_angry_ignition Simply execute
bazel test //bindings/pydrake:py/math_test
to see the failure.Suggested steps:
ignition
ecosystem)\cc @jwnimmer-tri