Closed hongkai-dai closed 5 years ago
Yikes -- gets the right answer 50% of the time!
Could be that FCL is always wrong, and 50% of the time SceneGraph is able to improve the answer. :) That's probable, right?
Some updates: when calling the narrowphase distance
function, the order of the two geometries o1
and o2
are permuted across two consecutive calls. Here is some output
tf1:
1 0 0 0.11
0 1 0 0.21
0 0 1 0.31
0 0 0 1
tf2:
1 0 0 0.1
0 1 0 0.2
0 0 1 0.3
0 0 0 1
distance -0.282678665097460
tf1:
1 0 0 0.1
0 1 0 0.2
0 0 1 0.3
0 0 0 1
tf2:
1 0 0 0.11
0 1 0 0.21
0 0 1 0.31
0 0 0 1
distance -0.282678662686879
Note that tf1
in the first call becomes tf2
in the second call, and vice versa.
More updates: The swapping happens in SceneGraph
, before we call https://github.com/RobotLocomotion/drake/blob/df86de15c0ba63a616dd347b263d8e2cc083875d/geometry/proximity_engine.cc#L323
I add print out for fcl_object_A
and fcl_object_B
. The result is swapped across two calls.
fcl_object_A transform:0.11 0.21 0.31
fcl_object_B transform:0.1 0.2 0.3
distance -0.282678665097460
fcl_object_A transform:0.1 0.2 0.3
fcl_object_B transform:0.11 0.21 0.31
distance -0.282678662686879
What really surprised me is that I am calling the same SceneGraph
, with the same Context
, but just calling ComputeSignedDistancePairwiseClosestPoints()
repeatedly, as in https://github.com/hongkai-dai/drake/blob/inconsistent_signed_distance/multibody/inverse_kinematics/test/collision_distance_demo.cc I thought each geometry object has an ID, and we assign object A/ B based on their ID. Should I expect the ID to change?
That's most likely attributable to the broadphase algorithm. SceneGraph doesn't order them, per se. It simply processes the pair as they are reported (and then orders them on the output to a fixed ordering.)
What's surprising is that for a two geometries with fixed poses, that the broadphase algorithm would not be deterministic...
Even though the impact is uncertain, the consequence could be large, hence prioritizing.
I confirm that I cloned drake
from @hongkai-dai and disabled any push from me for safety:
damrongguoy@puget-183882-03:~/GitHub/hongkai-dai/drake (master)$ git remote -v
origin git@github.com:hongkai-dai/drake.git (fetch)
origin no_push (push)
upstream git@github.com:RobotLocomotion/drake.git (fetch)
upstream no_push (push)
I'll check collision_distance_demo.cc
in the branch inconsistant_signed_distance.
damrongguoy@puget-183882-03:~/GitHub/hongkai-dai/drake (master)$ git checkout inconsistent_signed_distance
Branch 'inconsistent_signed_distance' set up to track remote branch 'inconsistent_signed_distance' from 'origin'.
Switched to a new branch 'inconsistent_signed_distance'
damrongguoy@puget-183882-03:~/GitHub/hongkai-dai/drake (inconsistent_signed_distance)$
The install_prereqs.sh
asked me to downgrade bazel form 0.21 to 0.19.2.
I'm building now.
I confirm that I can reproduce the reported problem by running Hongkai's code.
(inconsistent_signed_distance)$ bazel run //multibody/inverse_kinematics:collision_distance_demo
INFO: Analysed target //multibody/inverse_kinematics:collision_distance_demo (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //multibody/inverse_kinematics:collision_distance_demo up-to-date:
bazel-bin/multibody/inverse_kinematics/collision_distance_demo
INFO: Elapsed time: 0.463s, Critical Path: 0.00s, Remote (0.00% of the time): [queue: 0.00%, setup: 0.00%, process: 0.00%]
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
distance -0.282678665097460
distance -0.282678662686879
distance -0.282678665097460
distance -0.282678662686879
distance -0.282678665097460
distance -0.282678662686879
distance -0.282678665097460
distance -0.282678662686879
I'll start CLion and debug.
Checkpoint 1. I saw the search-tree dynamic_tree_
changes each time we reach the following line in ComputeSignedDistancePairwiseClosestPoints()
.
920. dynamic_tree_.distance(&distance_data, DistanceCallback);
The change is that the two children swapped. For example, the two children in the first call are pointers ending with f30 and ed0:-
Then, in the second call, they switched to ed0 and f30:-
I'm guessing some kinds of "update" to the tree caused the switch. Perhaps the tree is re-balanced somewhere?
===============================================================================
Checkpoint 2. I also found that before and after this line in proximity_engine.cc
UpdateWorldPoses(X_WG, indices)
, the dynamic_tree_
switched its two children:
dynamic_tree_.update();
Just for verification, I did dynamic_tree_.update()
twice. I saw the two children switched back and the print out is consistent now:
Start 8 repeated ComputeSignedDistancePairwiseClosestPoints():-
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
Start 8 repeated ComputeSignedDistancePairwiseClosestPoints():-
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
Start 8 repeated ComputeSignedDistancePairwiseClosestPoints():-
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
distance -0.282678662686879
Checkpoint 3. Calling dynamictree.update() twice is just a hack. I'll investigate more into fcl::DynamicAABBTreeCollisionManager<S>::update()
.
Wow! Great find @DamrongGuoy! That certainly sounds like the guilty party.
It has been a week since I last updated. Thanks to @SeanCurtis-TRI and @hongkai-dai I was able to set up a unit test in FCL to reproduce the same problem. Every time we call fcl::DynamicAABBTreeCollisionManager<S>::update()
the tree {root {A, B}} becomes {root {B,A}} even though there was no change in the poses of A and B.
I am checking the code to maintain the tree. For some reasons, it involves incrementally adjusting the tree including removing and inserting the same node again and again.
Since the code has minimal documentation, I am adding comments to explain what I think the code is doing. At the least, we will improve FCL in that sense.
If I cannot solve the problem in FCL by Wednesday, I will switch to the work around in Drake that always calls narrow-phase-distance(A,B)
regardless of whether the broadphase
gives us A,B
or B,A
as mentioned previously by @SeanCurtis-TRI. The consistent order of A,B
in calling narrow-phase-distance
should give the consistent result for drake::ComputeSignedDistancePairwiseClosestPoints()
.
I attempted to compute the signed distance between two spheres, using a diagram containing an MBP and a SceneGraph. Calling
ComputeSignedDistancePairwiseClosestPoints()
repeatedly gives me different result for the same configuration. To reproduce the problem, the code is in https://github.com/hongkai-dai/drake/blob/inconsistent_signed_distance/multibody/inverse_kinematics/test/collision_distance_demo.cc The output of this code isInside this code, MBP computes the pose of each body, store that in the context, and then pass it to SceneGraph, SceneGraph then calls FCL's broadphase algorithm, which eventually calls FCL's narrowphase EPA implementation.
I also created a test in FCL, for the two spheres with the same configuration. I then compute the signed distance between these two spheres using narrowphase EPA implementation. The code is in https://github.com/hongkai-dai/fcl/blob/98f81641753816323dadfd8f1ff865a40d5db28e/test/test_fcl_signed_distance.cpp#L195-L202. This produces the same result for the same configuration
So the problem occurs between MBP computes the link poses and FCL broadphase algorithm.
cc @sherm1 @SeanCurtis-TRI