Closed huihuaTRI closed 4 days ago
Thanks for posting the issue. :)
Some details from the original (unavailable) slack message (emphasis mine):
I loaded a [redacted] model into the simulation. It's a simple model that has three bodies and two joints. The non-trivial part is that it has about 400 collision mesh files (objs). These objs are generated from an auto convex decomposition process that break the [redacted] model into small pieces. The problem is: after loading the [redacted] model, the simulation slows down significantly to 0.22 real time factor. Without the [model], the simulation could run 2.0 real time factor.
I used valgrind to profiling the [redacted] simulation. It turns out that 57.3% of the total time is on
void fcl::detail::supportConvex<double>()
.
It is worth noting fcl::detail::supportConvex
is a function in FCL that is part of the generic iterative intersection algorithm (minkowski portal refinement) for contact between two convex shapes. Given a direction vector, it returns a point on a shape that is the farthest in that direction. It is known that FCL's supportConvex
does a linear search on all vertices for the Convex
shape type.
The simulation used a discrete MBP with a timestep of 2.5 ms.
For the simulation to run with a real time factor of 2X, the full computation (geometry + dynamics) needed to completely execute in no more than 1.25 ms. This is a generous ceiling and, due to other overheads in the simulation, is probably smaller. But this number will be significant (see below). On the other hand, with that fixed time step, a real time factor rate of 0.22 means that it's taking something along the lines of 2.5 / 0.22 = 11.25 ms to get from one discrete update to the next.
supportConvex
Given the above information, I made the assumption that the objs were high resolution (frequently a feature of automatic generation). Thus, repeated linear searches for a support vertex would scale linearly with the number of vertices. Analyzing the meshes provided the following distribution: | Vertex count | Number of meshes |
---|---|---|
10 | 1 | |
18 | 2 | |
120 | 340 | |
138 | 1 | |
208 | 2 | |
338 | 1 | |
1442 | 1 | |
16007 | 49 | |
17449 | 4 |
Table 1: Taxonomy of convex meshes included with the [redacted] model based on mesh complexity.
1/8 of the meshes have more than 16,000 vertices and the remaining have at least 100. It doesn't prove the hypothesis, but it does suggest it's viable.
It turns out that the convex decomposition produced garbage data. When this particular analysis was performed, it was assumed that the meshes were meaningful. Only subsequently when trying to reproduce the reported results, were the details of the meshes investigated. The following was observed:
This is very much a case of horrible input. It does raise the question: can Drake handle garbage input like this better?
supportConvex
for Convex
shapesI modified the Convex
class to include knowledge of edge adjacency; given a vertex, it is known which vertices are connected to it by an edge. The supportConvex
method then becomes a case of walking edges into the specified direction. For small meshes (few vertices), this would be more expensive; for large meshes this would be much cheaper.
I set up a simple experiment with multiple contacts between tessellated spheres with graduated mesh complexity to gauge the impact of the "edge-walking" scheme. The figure below shows the results
Figure 1: Impact of edge-walking over linear search based on mesh complexity.
As expected, with < 32 vertices, the overhead of walking the edges slightly penalizes the algorithm. However, as the vertices increase, the overhead vanishes, and the edge walking provides a distinct benefit; at 1,000 vertices, it takes half the time. While the data doesn't extend to include cases where the meshes have 10,000 vertices, the trends don't suggest that this optimization can make up for the 10X loss of performance observed above (real time factor change from 2.0 to 0.22).
The two horizontal lines provide reference baselines. Sphere primitive
(solid black line) is the cost of replacing the sphere meshes with actual spheres. Computing contact between two spheres is trivial and this represents the cheapest possible query. Sphere GJK
(dashed black line) uses mathematical spheres in the general convexity algorithm (GJK). The supportConvex
method for Sphere
is O(1) and the difference between Sphere primitive
and Sphere GJK
more or less captures the cost of the GJK algorithm (i.e., it represents a floor for how fast the general convexity algorithm can run).
Created a simplified scenario approximating the scenario as reported. The model with the 400 convex meshes was loaded and then a single Box
was placed in shallow contact with the model. (In the real scenario, it was three boxes in near-contact with the model). I decorated the code with high-resolution timers and assessed the cost of evaluating the point-pair contact query. I performed 50 collision queries on the scenario in one of three configurations:
supportConvex
in FCL's master.supportConvex
in my branch.supportConvex
at all.) This serves as the baseline.Scenario | PoseUpdate (s) | Filtered (s) | False Positive (s) | True Positive (s) | Broadphase (s) | Total (s) |
---|---|---|---|---|---|---|
Convex meshes, FCL master | 0.004957 | 0.1243 | 0.5371 | 0.1706 | 0.07759 | 0.9145 |
Convex meshes, FCL faster | 0.005738 | 0.1294 | 0.5807 | 0.03612 | 0.08532 | 0.8374 |
Box collision | 0.003004 | 0.1189 | 0.1503 | 0.0005785 | 0.06969 | 0.3425 |
Table 2: The cost (in seconds) of performing 50 collision queries in the test scenario, broken down by its constituent components.
The constituent components of the query are broken down as follows:
The three rows above show absolute times. The pie charts below show percentages (one pie chart for each row).
PoseUpdate
is a largely negligible cost for now (less than 1% of compute time across all experiments).True Positive
) the introduction of the edge walking algorithm improved performance by a factor of 4-5. This is largely attributable to the fact that the box was in contact with the highest resolution meshes (the body).False Positive
is the single largest computation (half or more).
Figure 2: Percentages of compute time for convex meshes with linear search supportVertex
function.
Figure 3: Percentages of compute time for convex meshes with edge-walking supportVertex
function.
Figure 4: Percentages of compute time for box collision objects.
Wow -- awesome study and report Sean, thanks. Is it fair to conclude that for this to run in real time we have to fix the crappy input meshes?
Thanks for the detailed report. It's amazing and I learned a lot from it.
I am surprised that there are so many collision pairs to work within the Filtered and False Positive stages. Are those collision pairs come from checking the whole fridge or just the face that next to the counter. To be more general, my question is whether the Filtered stage will filter out the vertices that are far away from the other object.
The filtered and false positive aren't as surprising as you might think.
For 50 queries, we tallied 1,570,700 pairs. For a single query, that is 31,414 pairs. If we assumed the broadphase got trashed by all of the overlapping, redundant geometries and we did a simple O(N^2) pairing of the collision geometries, all we would need is 250 geometries to reach that number (i.e., if N = 250, there are 1/2 N (N + 1) pairings).
Addendum to the test case.
After adding convex mesh validation code to FCL (a prerequisite for valid edge walking), it was found that at least some of the meshes used in the evaluation weren't watertight; there were cracks in the mesh. This would have two potential effects:
Both of those factors may contribute to a reduction in the performance of the edge walking optimization in this specific scenario.
I wonder what here might be actionable @SeanCurtis-TRI, especially now that we're doing convex hulls?
Is this ticket more about some specific kinds of benchmarks we could add and eventually speed up, or is it about helping flag for users that their input files are pants-on-fire crazy and they are holding it wrong?
Problem statement: Loading too many obj mesh files for collision geometry will slow down the simulation significantly.
The users try to use auto convex decomposition process to automatically generate convex collision meshes files for an object since Drake supports convex mesh objs. The auto convex decomposition process really adds the power to pragmatically generate new items quickly with very detailed collision geometries. However, for an object with non-trivial contour, this process could potentially generate hundreds of objs.