Open cohnt opened 6 months ago
Here's a slightly more minimal reproduction:
import numpy as np
from pydrake.all import (
StartMeshcat,
DiagramBuilder,
AddMultibodyPlantSceneGraph,
Parser,
MeshcatVisualizer,
)
directives = """
# Model directive for iiwa with sphere collision and welded gripper
directives:
# Add iiwa
- add_model:
name: iiwa
file: package://drake/manipulation/models/iiwa_description/urdf/iiwa14_spheres_collision.urdf
- add_weld:
parent: world
child: iiwa::base
# Add schunk
- add_model:
name: wsg
file: package://drake/manipulation/models/wsg_50_description/sdf/schunk_wsg_50_welded_fingers.sdf
- add_frame:
name: iiwa::wsg_attach
X_PF:
base_frame: iiwa::iiwa_link_7
translation: [0, 0, 0.114]
rotation: !Rpy { deg: [90.0, 0.0, 0.0 ]}
- add_weld:
parent: iiwa::wsg_attach
child: wsg::body
"""
def build_env(meshcat, directives):
builder = DiagramBuilder()
plant, scene_graph = AddMultibodyPlantSceneGraph(builder, time_step=0.0)
parser = Parser(plant)
parser.AddModelsFromString(directives, ".dmd.yaml")
plant.Finalize()
meshcat_visual = MeshcatVisualizer.AddToBuilder(
builder, scene_graph, meshcat)
meshcat_visual.Delete()
diagram = builder.Build()
return diagram, plant, scene_graph
meshcat = StartMeshcat()
diagram, plant, scene_graph = build_env(meshcat, directives)
diagram_context = diagram.CreateDefaultContext()
plant_context = plant.GetMyContextFromRoot(diagram_context)
q = plant.GetPositions(plant_context)
q[0] = np.inf
for i in range(3):
plant.SetPositions(plant_context, q)
diagram.ForcedPublish(diagram_context)
Yes, that example is more minimal than mine. (Perhaps "minimal" working example is a bit of a misnomer!)
Note also that q[0] = np.nan
also causes a segfault.
I made a quick c++ reproduction (which can run in meshcat_visualizer_test.cc
):
GTEST_TEST(MeshcatVisualizerTest, Issue21219) {
std::string directives = R"""(
directives:
- add_model:
name: iiwa
file: package://drake/manipulation/models/iiwa_description/urdf/iiwa14_spheres_collision.urdf
- add_weld:
parent: world
child: iiwa::base
- add_model:
name: wsg
file: package://drake/manipulation/models/wsg_50_description/sdf/schunk_wsg_50_welded_fingers.sdf
- add_frame:
name: iiwa::wsg_attach
X_PF:
base_frame: iiwa::iiwa_link_7
translation: [0, 0, 0.114]
rotation: !Rpy { deg: [90.0, 0.0, 0.0 ]}
- add_weld:
parent: iiwa::wsg_attach
child: wsg::body
)""";
auto meshcat = std::make_shared<Meshcat>();
systems::DiagramBuilder<double> builder;
auto [plant, scene_graph] = AddMultibodyPlantSceneGraph(&builder, 0.0);
multibody::Parser parser(&plant);
parser.AddModelsFromString(directives, ".dmd.yaml");
plant.Finalize();
MeshcatVisualizer<double>::AddToBuilder(&builder, scene_graph, meshcat);
auto diagram = builder.Build();
auto diagram_context = diagram->CreateDefaultContext();
systems::Context<double>* plant_context = &plant.GetMyMutableContextFromRoot(diagram_context.get());
Eigen::VectorXd q = plant.GetPositions(*plant_context);
q[0] = std::numeric_limits<double>::infinity();
for (int i=0; i<3; ++i) {
plant.SetPositions(plant_context, q);
diagram->ForcedPublish(*diagram_context);
}
}
It throws in RotationMatrix on this line, which seems like it's the intended behavior. The stack trace is coming through the Multibody::CalcPositionKinematicsCache()
.
Assuming that we think it's reasonable for the kinematics code to throw when a joint angle is inf or nan (not sure what it could do better?), then the main question is: why do we segfault instead of throwing the exception?
In c++, when I run in debug mode, i see:
[ RUN ] MeshcatVisualizerTest.Issue21219
[2024-03-29 17:27:29.970] [console] [info] Meshcat listening for connections at http://localhost:7001
unknown file: Failure
C++ exception with description "Error: Rotation matrix contains an element that is infinity or NaN." thrown in the test body.
but in release mode, i see
[ RUN ] MeshcatVisualizerTest.Issue21219
[2024-03-29 17:27:42.135] [console] [info] Meshcat listening for connections at http://localhost:7001
Segmentation fault (core dumped)
I guess this is probably because of some compiler optimization?
Is that something we can/should fix? Maybe @jwnimmer-tri knows?
It could be that the throw is inside a DRAKE_ASSERT_VOID
guard and is only triggered in debug and the segfault happens later on.
Good call @xuchenhan-tri . That's exactly what's happening. I'll look for the next offender.
Ok. The segfault is in fcl::DynamicAABBTreeCollisionManager
, called from SceneGraph<T>::CalcPoseUpdate
. Maybe we should actually throw in the kinematics call, even in release mode?
Definitely seems better than segfaulting, but we should probably be careful about where the check/throw happens given how low level rotation matrix is.
FYI #19263 is tangentially related. In that case, calculating using a HalfSpace ended up introducing inf
values, which then also segfaulted.
Because RotationMatrix throws (in debug builds) for non-finite values, that means any q
that can influence into a rotation must always be finite. Given that, I think it makes sense to say that all of the MultibodyPlant's q
vector must always be finite. Therefore, we should throw (even in release builds) when it's not.
Ideally the MbP kinematics would do that, but in case we think those are evaluated too often to pay for the checks, we could imagine that the SG pose update (in prep for a collision check) could do the check.
The DRAKE_ASSERT_VOID
that made this a debug-only throw is here. Do I understand your comment as saying that we should not change this line to a release-build throw; e.g. directly in the RotationMatrix class? We could potentially move SetUnchecked()
to the public API.
Do I understand your comment as saying that we should not change this line to a release-build throw; ...
Yes, that's right. IIRC the dynamics people are violently against promoting that one to a release-build check.
Right! That's a very expensive check.
+@rpoyner-tri for disposition (and resolution?) since our normal component owner is ooo.
We could imagine that the systems framework could offer an option, when a system declares its discrete state, the declaration could mark it as "must be finite" (or "must not be nan" in some cases) and then the framework handles all of the validity checking on behalf of that system. I imagine we want many more systems than just MbP to fail-fast when operating on non-finite values.
Likewise declaring an input (or output) numeric port could declare finite (or non-nan), and the framework / diagram could enforce that. For instance, if an actuation input port was nan.
What happened?
I encountered this bug, and was able to reduce it to this MWE.
Version
No response
What operating system are you using?
No response
What installation option are you using?
No response
Relevant log output
No response