RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.35k stars 1.27k forks source link

Visualization is Broken in Manipulation Station #11963

Closed maggi3wang closed 5 years ago

maggi3wang commented 5 years ago

Running bazel run //examples/manipulation_station:end_effector_teleop_sliders --config gurobi leads to a seemingly broken visualization.

Here is the view in meshcat:

Screen Shot 2019-08-26 at 3 34 53 PM

Replicated using the drake_visualizer tool:

Screen Shot 2019-08-26 at 3 39 37 PM

Off the latest master. Running on MacOS (Mojave 10.14.5).

jwnimmer-tri commented 5 years ago

Wow, that looks bad!

With my daily on-call hat on, I've assigned a notional team and owner. Please feel free to reassign if there are better parties.

I suspect the first action should be to bisect down to the commit where this first went wrong.

\CC @SeanCurtis-TRI FYI in case you think this might be a problem wider than just the Manipulation Station example.

curds01 commented 5 years ago

I'll probably take a peek myself tomorrow. Thanks for flagging this.

@maggi3wang The two images show slightly different configurations of geometry. Are they supposed to be the same? (I.e., they both seem to be wrong, but should be wrong in the same way or in different ways?)

maggi3wang commented 5 years ago

@curds01 the one in meshcat was moved around before screenshotted. Here's what it looks like without any maneuvering:

Screen Shot 2019-08-27 at 1 08 29 AM

(so I believe the configurations are the same)

RussTedrake commented 5 years ago

I've confirmed that I see the same behavior on my mac; and that it looks fine on ubuntu.

SeanCurtis-TRI commented 5 years ago

Fine on Ubuntu and bad on mac puts me out of the running. Good luck, @RussTedrake (And sorry for the @curds01 post -- I'd responded from home last night and forgotten who I was logged in as.)

maggi3wang commented 5 years ago

@gizatt and I found the issue.

In drake/geometry/scene_graph.cc, MakePoseBundle() iterates through an unordered map, g_state.frames_, to create the dynamic_frames vector. CalcPoseBundle() iterates through the same unordered map, but assumes that the g_state.frames_ is in the same order as in MakePoseBundle().

If you replace (line 417):

  for (FrameId f_id : dynamic_frames) {
    // TODO(#11888): Remove GetAsIsometry3() when PoseBundle supports
     // RigidTransform.
    output->set_pose(i, g_state.get_pose_in_world(f_id).GetAsIsometry3());
    // TODO(SeanCurtis-TRI): Handle velocity.
    ++i;
  }

with

  for (FrameId f_id : dynamic_frames) {
    SourceId s_id = g_state.get_source_id(f_id);
    const std::string& src_name = g_state.get_source_name(s_id);
    const std::string& frm_name = g_state.get_frame_name(f_id);
    std::string name = src_name + "::" + frm_name;

    for (int i = 0; i < static_cast<int>(dynamic_frames.size()); i++) {
      std::string output_str = output->get_name(i);
      if (output_str.std::string::compare(name) == 0) {
        output->set_pose(i, g_state.get_pose_in_world(f_id).GetAsIsometry3());
      }
    }
  }

which matches the string name created from MakePoseBundle() to the output (PoseBundle) string name, we get a nice-looking visualization:

Screen Shot 2019-08-27 at 6 26 50 PM

This is certainly not optimized (switching to an ordered C++ map would probably be the real fix, but this is what's causing the issue. The compiler difference between Linux and Mac probably explains the difference in unordered map hashing behavior.

SeanCurtis-TRI commented 5 years ago

That can't be the cause.

While there may be an ordering difference in unordered_map between linux and mac, there won't be a difference each time the data is iterated through. Even on mac.

What would have to cause this problem is that the frame_ member would have to have changed between call to MakePoseBundle and calls to CalcPoseBundle. And the only way for that to have happened is for a frame to have been added between the two calls.

gizatt commented 5 years ago

Ah, good point -- I had that theory earlier but couldn't find a culprit frame addition (which I assume must be in ManipulationStation?) on my first pass. I don't really understand the timing of those two calls -- does MakePoseBundle get called only once (to create some kind of default clone-able container), or every time a new output is calculated?

SeanCurtis-TRI commented 5 years ago

What you want to look for is the call to ConnectDrakeVisualizer(). See if there's an added frame after that. It might be buried in camera registration.

SeanCurtis-TRI commented 5 years ago

The other two landmarks to look for is the call to DispatchLoadMessage() and the context allocation. It's definitely the case that frame registration after context allocation will cause things to explode.

maggi3wang commented 5 years ago

I ran a bisect that showed that Updates geometry::Identifier hash method (#11872) (commit 861cd550e79f9a952cb529bc8f7c94cbbcc1e40b) is where this issue first occurred. Not sure why, though—any ideas?

jwnimmer-tri commented 5 years ago

@maggi3wang Thank you!

Per the above posts, the problem seems to relate to an unordered_map use. The hash method in #11872 changes how items added to the unordered_map are assigned to buckets, and thus their iteration order. So while #11872 is probably a necessary condition to reproduce this bug, the root cause logic error would be in some other code.

Looking at the code that Greg identified, I can say that SceneGraph<T>::CalcPoseBundle is defective currently. It assumes that PoseBundle<T>* output is in some semi-constructed state when the method begins. That is defective with in the systems framework. It is the responsibility of CalcPoseBundle ensure *output has the correct value upon return, no matter its incoming value. It is an @param[out] only, not @param[in,out].

I don't know if that defect is the cause of this issue, but maybe it's somewhere to begin.

SeanCurtis-TRI commented 5 years ago

I agree with @jwnimmer-tri's characterization of the defect.

I've created a PR to address the issue (#11978) . @maggi3wang please confirm that this alleviates woes on the mac.

As a side note: I've got a branch where I'm changing the visualization mechanism to not rely on PoseBundle at all. So, when that gets up and running, this whole mechanism will be deprecated.

maggi3wang commented 5 years ago

@SeanCurtis-TRI #11979 worked on the Mac. Thanks!

SeanCurtis-TRI commented 5 years ago

Huzzah! Thanks for all the sleuthing, @maggi3wang.