Closed maggi3wang closed 5 years ago
Wow, that looks bad!
With my daily on-call hat on, I've assigned a notional team and owner. Please feel free to reassign if there are better parties.
I suspect the first action should be to bisect down to the commit where this first went wrong.
\CC @SeanCurtis-TRI FYI in case you think this might be a problem wider than just the Manipulation Station example.
I'll probably take a peek myself tomorrow. Thanks for flagging this.
@maggi3wang The two images show slightly different configurations of geometry. Are they supposed to be the same? (I.e., they both seem to be wrong, but should be wrong in the same way or in different ways?)
@curds01 the one in meshcat was moved around before screenshotted. Here's what it looks like without any maneuvering:
(so I believe the configurations are the same)
I've confirmed that I see the same behavior on my mac; and that it looks fine on ubuntu.
Fine on Ubuntu and bad on mac puts me out of the running. Good luck, @RussTedrake (And sorry for the @curds01 post -- I'd responded from home last night and forgotten who I was logged in as.)
@gizatt and I found the issue.
In drake/geometry/scene_graph.cc
, MakePoseBundle()
iterates through an unordered map, g_state.frames_
, to create the dynamic_frames
vector. CalcPoseBundle()
iterates through the same unordered map, but assumes that the g_state.frames_
is in the same order as in MakePoseBundle()
.
If you replace (line 417):
for (FrameId f_id : dynamic_frames) {
// TODO(#11888): Remove GetAsIsometry3() when PoseBundle supports
// RigidTransform.
output->set_pose(i, g_state.get_pose_in_world(f_id).GetAsIsometry3());
// TODO(SeanCurtis-TRI): Handle velocity.
++i;
}
with
for (FrameId f_id : dynamic_frames) {
SourceId s_id = g_state.get_source_id(f_id);
const std::string& src_name = g_state.get_source_name(s_id);
const std::string& frm_name = g_state.get_frame_name(f_id);
std::string name = src_name + "::" + frm_name;
for (int i = 0; i < static_cast<int>(dynamic_frames.size()); i++) {
std::string output_str = output->get_name(i);
if (output_str.std::string::compare(name) == 0) {
output->set_pose(i, g_state.get_pose_in_world(f_id).GetAsIsometry3());
}
}
}
which matches the string name created from MakePoseBundle()
to the output
(PoseBundle
) string name, we get a nice-looking visualization:
This is certainly not optimized (switching to an ordered C++ map
would probably be the real fix, but this is what's causing the issue. The compiler difference between Linux and Mac probably explains the difference in unordered map hashing behavior.
That can't be the cause.
While there may be an ordering difference in unordered_map
between linux and mac, there won't be a difference each time the data is iterated through. Even on mac.
What would have to cause this problem is that the frame_
member would have to have changed between call to MakePoseBundle
and calls to CalcPoseBundle
. And the only way for that to have happened is for a frame to have been added between the two calls.
Ah, good point -- I had that theory earlier but couldn't find a culprit frame addition (which I assume must be in ManipulationStation
?) on my first pass. I don't really understand the timing of those two calls -- does MakePoseBundle
get called only once (to create some kind of default clone-able container), or every time a new output is calculated?
What you want to look for is the call to ConnectDrakeVisualizer()
. See if there's an added frame after that. It might be buried in camera registration.
The other two landmarks to look for is the call to DispatchLoadMessage()
and the context allocation. It's definitely the case that frame registration after context allocation will cause things to explode.
I ran a bisect that showed that Updates geometry::Identifier hash method (#11872)
(commit 861cd550e79f9a952cb529bc8f7c94cbbcc1e40b
) is where this issue first occurred. Not sure why, though—any ideas?
@maggi3wang Thank you!
Per the above posts, the problem seems to relate to an unordered_map
use. The hash method in #11872 changes how items added to the unordered_map are assigned to buckets, and thus their iteration order. So while #11872 is probably a necessary condition to reproduce this bug, the root cause logic error would be in some other code.
Looking at the code that Greg identified, I can say that SceneGraph<T>::CalcPoseBundle
is defective currently. It assumes that PoseBundle<T>* output
is in some semi-constructed state when the method begins. That is defective with in the systems framework. It is the responsibility of CalcPoseBundle
ensure *output
has the correct value upon return, no matter its incoming value. It is an @param[out]
only, not @param[in,out]
.
I don't know if that defect is the cause of this issue, but maybe it's somewhere to begin.
I agree with @jwnimmer-tri's characterization of the defect.
I've created a PR to address the issue (#11978) . @maggi3wang please confirm that this alleviates woes on the mac.
As a side note: I've got a branch where I'm changing the visualization mechanism to not rely on PoseBundle
at all. So, when that gets up and running, this whole mechanism will be deprecated.
@SeanCurtis-TRI #11979 worked on the Mac. Thanks!
Huzzah! Thanks for all the sleuthing, @maggi3wang.
Running
bazel run //examples/manipulation_station:end_effector_teleop_sliders --config gurobi
leads to a seemingly broken visualization.Here is the view in meshcat:
Replicated using the
drake_visualizer
tool:Off the latest master. Running on MacOS (Mojave 10.14.5).