RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.24k stars 1.25k forks source link

SceneGraph reports different signed distance result for the same configuration #10286

Closed hongkai-dai closed 5 years ago

hongkai-dai commented 5 years ago

I attempted to compute the signed distance between two spheres, using a diagram containing an MBP and a SceneGraph. Calling ComputeSignedDistancePairwiseClosestPoints() repeatedly gives me different result for the same configuration. To reproduce the problem, the code is in https://github.com/hongkai-dai/drake/blob/inconsistent_signed_distance/multibody/inverse_kinematics/test/collision_distance_demo.cc The output of this code is

distance -0.282678665097460
distance -0.282678662686879
distance -0.282678665097460
distance -0.282678662686879
distance -0.282678665097460
distance -0.282678662686879
distance -0.282678665097460
distance -0.282678662686879

Inside this code, MBP computes the pose of each body, store that in the context, and then pass it to SceneGraph, SceneGraph then calls FCL's broadphase algorithm, which eventually calls FCL's narrowphase EPA implementation.

I also created a test in FCL, for the two spheres with the same configuration. I then compute the signed distance between these two spheres using narrowphase EPA implementation. The code is in https://github.com/hongkai-dai/fcl/blob/98f81641753816323dadfd8f1ff865a40d5db28e/test/test_fcl_signed_distance.cpp#L195-L202. This produces the same result for the same configuration

distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879

So the problem occurs between MBP computes the link poses and FCL broadphase algorithm.

cc @sherm1 @SeanCurtis-TRI

sherm1 commented 5 years ago

Yikes -- gets the right answer 50% of the time!

SeanCurtis-TRI commented 5 years ago

Could be that FCL is always wrong, and 50% of the time SceneGraph is able to improve the answer. :) That's probable, right?

hongkai-dai commented 5 years ago

Some updates: when calling the narrowphase distance function, the order of the two geometries o1 and o2 are permuted across two consecutive calls. Here is some output

tf1:
   1    0    0 0.11
   0    1    0 0.21
   0    0    1 0.31
   0    0    0    1
tf2:
  1   0   0 0.1
  0   1   0 0.2
  0   0   1 0.3
  0   0   0   1
distance -0.282678665097460

tf1:
  1   0   0 0.1
  0   1   0 0.2
  0   0   1 0.3
  0   0   0   1
tf2:
   1    0    0 0.11
   0    1    0 0.21
   0    0    1 0.31
   0    0    0    1
distance -0.282678662686879

Note that tf1 in the first call becomes tf2 in the second call, and vice versa.

hongkai-dai commented 5 years ago

More updates: The swapping happens in SceneGraph, before we call https://github.com/RobotLocomotion/drake/blob/df86de15c0ba63a616dd347b263d8e2cc083875d/geometry/proximity_engine.cc#L323

I add print out for fcl_object_A and fcl_object_B. The result is swapped across two calls.

fcl_object_A transform:0.11 0.21 0.31
fcl_object_B transform:0.1 0.2 0.3
distance -0.282678665097460
fcl_object_A transform:0.1 0.2 0.3
fcl_object_B transform:0.11 0.21 0.31
distance -0.282678662686879

What really surprised me is that I am calling the same SceneGraph, with the same Context, but just calling ComputeSignedDistancePairwiseClosestPoints() repeatedly, as in https://github.com/hongkai-dai/drake/blob/inconsistent_signed_distance/multibody/inverse_kinematics/test/collision_distance_demo.cc I thought each geometry object has an ID, and we assign object A/ B based on their ID. Should I expect the ID to change?

SeanCurtis-TRI commented 5 years ago

That's most likely attributable to the broadphase algorithm. SceneGraph doesn't order them, per se. It simply processes the pair as they are reported (and then orders them on the output to a fixed ordering.)

What's surprising is that for a two geometries with fixed poses, that the broadphase algorithm would not be deterministic...

tri-ltyyu commented 5 years ago

Even though the impact is uncertain, the consequence could be large, hence prioritizing.

DamrongGuoy commented 5 years ago

I confirm that I cloned drake from @hongkai-dai and disabled any push from me for safety:

damrongguoy@puget-183882-03:~/GitHub/hongkai-dai/drake (master)$ git remote -v
origin  git@github.com:hongkai-dai/drake.git (fetch)
origin  no_push (push)
upstream    git@github.com:RobotLocomotion/drake.git (fetch)
upstream    no_push (push)

I'll check collision_distance_demo.cc in the branch inconsistant_signed_distance.

damrongguoy@puget-183882-03:~/GitHub/hongkai-dai/drake (master)$ git checkout inconsistent_signed_distance
Branch 'inconsistent_signed_distance' set up to track remote branch 'inconsistent_signed_distance' from 'origin'.
Switched to a new branch 'inconsistent_signed_distance'
damrongguoy@puget-183882-03:~/GitHub/hongkai-dai/drake (inconsistent_signed_distance)$ 

The install_prereqs.sh asked me to downgrade bazel form 0.21 to 0.19.2.

I'm building now.

DamrongGuoy commented 5 years ago

I confirm that I can reproduce the reported problem by running Hongkai's code.

(inconsistent_signed_distance)$ bazel run  //multibody/inverse_kinematics:collision_distance_demo
INFO: Analysed target //multibody/inverse_kinematics:collision_distance_demo (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //multibody/inverse_kinematics:collision_distance_demo up-to-date:
  bazel-bin/multibody/inverse_kinematics/collision_distance_demo
INFO: Elapsed time: 0.463s, Critical Path: 0.00s, Remote (0.00% of the time): [queue: 0.00%, setup: 0.00%, process: 0.00%]
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
distance -0.282678665097460
distance -0.282678662686879
distance -0.282678665097460
distance -0.282678662686879
distance -0.282678665097460
distance -0.282678662686879
distance -0.282678665097460
distance -0.282678662686879

I'll start CLion and debug.

DamrongGuoy commented 5 years ago

Checkpoint 1. I saw the search-tree dynamic_tree_ changes each time we reach the following line in ComputeSignedDistancePairwiseClosestPoints().

920.  dynamic_tree_.distance(&distance_data, DistanceCallback);

The change is that the two children swapped. For example, the two children in the first call are pointers ending with f30 and ed0:-

image

Then, in the second call, they switched to ed0 and f30:-

image

I'm guessing some kinds of "update" to the tree caused the switch. Perhaps the tree is re-balanced somewhere?

===============================================================================

Checkpoint 2. I also found that before and after this line in proximity_engine.cc UpdateWorldPoses(X_WG, indices), the dynamic_tree_ switched its two children:

    dynamic_tree_.update();

Just for verification, I did dynamic_tree_.update() twice. I saw the two children switched back and the print out is consistent now:

Start 8 repeated ComputeSignedDistancePairwiseClosestPoints():-
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
Start 8 repeated ComputeSignedDistancePairwiseClosestPoints():-
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
Start 8 repeated ComputeSignedDistancePairwiseClosestPoints():-
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
    distance -0.282678662686879
DamrongGuoy commented 5 years ago

Checkpoint 3. Calling dynamictree.update() twice is just a hack. I'll investigate more into fcl::DynamicAABBTreeCollisionManager<S>::update().

sherm1 commented 5 years ago

Wow! Great find @DamrongGuoy! That certainly sounds like the guilty party.

DamrongGuoy commented 5 years ago

It has been a week since I last updated. Thanks to @SeanCurtis-TRI and @hongkai-dai I was able to set up a unit test in FCL to reproduce the same problem. Every time we call fcl::DynamicAABBTreeCollisionManager<S>::update() the tree {root {A, B}} becomes {root {B,A}} even though there was no change in the poses of A and B.

I am checking the code to maintain the tree. For some reasons, it involves incrementally adjusting the tree including removing and inserting the same node again and again.

Since the code has minimal documentation, I am adding comments to explain what I think the code is doing. At the least, we will improve FCL in that sense.

If I cannot solve the problem in FCL by Wednesday, I will switch to the work around in Drake that always calls narrow-phase-distance(A,B) regardless of whether the broadphase gives us A,B or B,A as mentioned previously by @SeanCurtis-TRI. The consistent order of A,B in calling narrow-phase-distance should give the consistent result for drake::ComputeSignedDistancePairwiseClosestPoints().