RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.33k stars 1.26k forks source link

Loading too many objs slows down the simulation significantly #13125

Closed huihuaTRI closed 4 days ago

huihuaTRI commented 4 years ago

Problem statement: Loading too many obj mesh files for collision geometry will slow down the simulation significantly.

The users try to use auto convex decomposition process to automatically generate convex collision meshes files for an object since Drake supports convex mesh objs. The auto convex decomposition process really adds the power to pragmatically generate new items quickly with very detailed collision geometries. However, for an object with non-trivial contour, this process could potentially generate hundreds of objs.

SeanCurtis-TRI commented 4 years ago

Thanks for posting the issue. :)

SeanCurtis-TRI commented 4 years ago

Recap of original problem characterization

Some details from the original (unavailable) slack message (emphasis mine):

I loaded a [redacted] model into the simulation. It's a simple model that has three bodies and two joints. The non-trivial part is that it has about 400 collision mesh files (objs). These objs are generated from an auto convex decomposition process that break the [redacted] model into small pieces. The problem is: after loading the [redacted] model, the simulation slows down significantly to 0.22 real time factor. Without the [model], the simulation could run 2.0 real time factor.

I used valgrind to profiling the [redacted] simulation. It turns out that 57.3% of the total time is on void fcl::detail::supportConvex<double>().

It is worth noting fcl::detail::supportConvex is a function in FCL that is part of the generic iterative intersection algorithm (minkowski portal refinement) for contact between two convex shapes. Given a direction vector, it returns a point on a shape that is the farthest in that direction. It is known that FCL's supportConvex does a linear search on all vertices for the Convex shape type.

The simulation used a discrete MBP with a timestep of 2.5 ms.

For the simulation to run with a real time factor of 2X, the full computation (geometry + dynamics) needed to completely execute in no more than 1.25 ms. This is a generous ceiling and, due to other overheads in the simulation, is probably smaller. But this number will be significant (see below). On the other hand, with that fixed time step, a real time factor rate of 0.22 means that it's taking something along the lines of 2.5 / 0.22 = 11.25 ms to get from one discrete update to the next.

Initial hypotheses and results

Hypothesis: high resolution meshes causing excessive cost in supportConvex

Given the above information, I made the assumption that the objs were high resolution (frequently a feature of automatic generation). Thus, repeated linear searches for a support vertex would scale linearly with the number of vertices. Analyzing the meshes provided the following distribution: Vertex count Number of meshes
10 1
18 2
120 340
138 1
208 2
338 1
1442 1
16007 49
17449 4

Table 1: Taxonomy of convex meshes included with the [redacted] model based on mesh complexity.

1/8 of the meshes have more than 16,000 vertices and the remaining have at least 100. It doesn't prove the hypothesis, but it does suggest it's viable.

Garbage meshes

It turns out that the convex decomposition produced garbage data. When this particular analysis was performed, it was assumed that the meshes were meaningful. Only subsequently when trying to reproduce the reported results, were the details of the meshes investigated. The following was observed:

This is very much a case of horrible input. It does raise the question: can Drake handle garbage input like this better?

Implement a straightforward optimization to FCL's supportConvex for Convex shapes

I modified the Convex class to include knowledge of edge adjacency; given a vertex, it is known which vertices are connected to it by an edge. The supportConvex method then becomes a case of walking edges into the specified direction. For small meshes (few vertices), this would be more expensive; for large meshes this would be much cheaper.

I set up a simple experiment with multiple contacts between tessellated spheres with graduated mesh complexity to gauge the impact of the "edge-walking" scheme. The figure below shows the results

smart_convex Figure 1: Impact of edge-walking over linear search based on mesh complexity.

As expected, with < 32 vertices, the overhead of walking the edges slightly penalizes the algorithm. However, as the vertices increase, the overhead vanishes, and the edge walking provides a distinct benefit; at 1,000 vertices, it takes half the time. While the data doesn't extend to include cases where the meshes have 10,000 vertices, the trends don't suggest that this optimization can make up for the 10X loss of performance observed above (real time factor change from 2.0 to 0.22).

The two horizontal lines provide reference baselines. Sphere primitive (solid black line) is the cost of replacing the sphere meshes with actual spheres. Computing contact between two spheres is trivial and this represents the cheapest possible query. Sphere GJK (dashed black line) uses mathematical spheres in the general convexity algorithm (GJK). The supportConvex method for Sphere is O(1) and the difference between Sphere primitive and Sphere GJK more or less captures the cost of the GJK algorithm (i.e., it represents a floor for how fast the general convexity algorithm can run).

Reproduce performance issues.

Created a simplified scenario approximating the scenario as reported. The model with the 400 convex meshes was loaded and then a single Box was placed in shallow contact with the model. (In the real scenario, it was three boxes in near-contact with the model). I decorated the code with high-resolution timers and assessed the cost of evaluating the point-pair contact query. I performed 50 collision queries on the scenario in one of three configurations:

Scenario PoseUpdate (s) Filtered (s) False Positive (s) True Positive (s) Broadphase (s) Total (s)
Convex meshes, FCL master 0.004957 0.1243 0.5371 0.1706 0.07759 0.9145
Convex meshes, FCL faster 0.005738 0.1294 0.5807 0.03612 0.08532 0.8374
Box collision 0.003004 0.1189 0.1503 0.0005785 0.06969 0.3425

Table 2: The cost (in seconds) of performing 50 collision queries in the test scenario, broken down by its constituent components.

The constituent components of the query are broken down as follows:

The three rows above show absolute times. The pie charts below show percentages (one pie chart for each row).

Observations

pie_slow_convex Figure 2: Percentages of compute time for convex meshes with linear search supportVertex function.

pie_fast_convex Figure 3: Percentages of compute time for convex meshes with edge-walking supportVertex function.

pie_box_collision Figure 4: Percentages of compute time for box collision objects.

Possible tasks

sherm1 commented 4 years ago

Wow -- awesome study and report Sean, thanks. Is it fair to conclude that for this to run in real time we have to fix the crappy input meshes?

huihuaTRI commented 4 years ago

Thanks for the detailed report. It's amazing and I learned a lot from it.

I am surprised that there are so many collision pairs to work within the Filtered and False Positive stages. Are those collision pairs come from checking the whole fridge or just the face that next to the counter. To be more general, my question is whether the Filtered stage will filter out the vertices that are far away from the other object.

SeanCurtis-TRI commented 4 years ago

The filtered and false positive aren't as surprising as you might think.

For 50 queries, we tallied 1,570,700 pairs. For a single query, that is 31,414 pairs. If we assumed the broadphase got trashed by all of the overlapping, redundant geometries and we did a simple O(N^2) pairing of the collision geometries, all we would need is 250 geometries to reach that number (i.e., if N = 250, there are 1/2 N (N + 1) pairings).

SeanCurtis-TRI commented 4 years ago

Addendum to the test case.

After adding convex mesh validation code to FCL (a prerequisite for valid edge walking), it was found that at least some of the meshes used in the evaluation weren't watertight; there were cracks in the mesh. This would have two potential effects:

  1. Cracks in what should otherwise be a watertight topology can, best case, cause the edge walked to be longer. The optimal path would ordinarily traverse the mesh surface across where a crack lies, but can't because there's no adjacency across the crack. Therefore, it takes a longer path "around" the path.
  2. In the worst case, cracks can lead to the wrong answers. Because the edge walking is, essentially, a gradient descent, a crack can create a local minimum that will trap the walk, producing the wrong answer.

Both of those factors may contribute to a reduction in the performance of the edge walking optimization in this specific scenario.

jwnimmer-tri commented 7 months ago

I wonder what here might be actionable @SeanCurtis-TRI, especially now that we're doing convex hulls?

Is this ticket more about some specific kinds of benchmarks we could add and eventually speed up, or is it about helping flag for users that their input files are pants-on-fire crazy and they are holding it wrong?