Closed edrumwri closed 8 months ago
Thanks for pointing these out, @edrumwri! Some questions:
The colorful lines of code (tree functions) are a distraction. Look at the method names after the bullet points (plant functions). Those plant function do not offer user-supplied storage, and the entry points that we actually care about. (In particular, GetKinematicPathToWorld is called internally by MbT, not by the user.)
Per discussion with @jwnimmer-tri over slack, the key point that I neglected to mention is the typical workflow for real-time programming where one first goes through a "warm up" period where heap can be allocated at will. Afterwards, a real-time phase is entered where no further heap allocations are allowed.
The caching system naturally supports this workflow systems if the cache allocates the memory once and the cache evaluation function doesn't allocate any new memory (@ggould-tri has pointed out concerns with using the caching system in a real-time context; we're still proceeding under the assumption that we can use caching, carefully, to avoid unwanted heap allocations).
* the other two seem like avoidable allocations, though possibly requiring an API change to pass in some necessary temp space. For prioritizing, have you been able to measure an effect from the heap allocation or is the cost speculative at this point?
I don't think API changes are necessary as long as the caching system is used for maintaining those std::vector objects. In any case, our primary concern isn't cost, but deterministic computation times, and memory allocation (among other OS services) is typically not deterministic.
Ah, yes for the first case the guilty code is here, where that output argument is allocated.
cc @mitiguy
Issue #13186 was filed on MBP doing "undesirable heap allocation" by @edrumwri who was also the requester of issue #12592 for the new moving-frame bias acceleration feature (related issue #13453). Heap allocation in CalcBiasSpatialAcceleration() caused concern -- is the new method useful for Evan's application without caching? From email correspondence with Evan (June 16, 2020): "I can wait for the heap allocation to be eliminated at a later date. We're unable to use Drake in a real-time application until all of the heap allocations in MBP have been eliminated: adding a new heap allocation does not make the problem any better, but neither does it make it considerably worse."
but neither does it make it considerably worse
Not sure I agree with that -- the changes in #13453 double the number of heap allocations in MBP.
Turns out there are many other heap allocations in the MultibodyPlant computations. We are tracking those now with the instrumented cassie_benchmark and work is in progress to remove them.
This issue does not seem specific enough to be actionable. There are hundreds of public functions on MbP (and more if you count the supporting public classes like Body, Joint, etc.). What list of functions in particular need to be heap-sensitive? Without a list of functions to tackle (i.e., to write LimitMalloc test cases for), there is no way to move forward here and we should close out the issue.
Is the implication of Sherm's comment above that the only relevant functions are the non-autodiff cassie tests? Which is to say, fixing the 3 heap ops in DoubleInverseDynamics
and 22 heap ops in ForwardDynamics
is sufficient to close this issue?
WDYT @edrumwri ?
I think we can discount the parts of the forward dynamics stepping scheme- there is little hope that methods that compute contact forces can be made real-time safe, at least in general, so it doesn't make sense to chase down heap allocs there. But pretty much everything else in MBP*- query functions, slicing functions (i.e., get parts of an array), forward kinematics, inverse dynamics- should be able to avoid heap allocations / be made to be real time safe.
I've viewed removing heap allocs as a win for both real time safety and performance. If there is not interest in maintaining no-heap-allocs on the MBP side (we've been blessed with few heap allocs in the systems framework), we can always move the MBP accesses to a non-real time thread on our end. But it would be a shame to have to do so.
*
I haven't been paying attention to the deformable body stuff that I know is coming/already here, so discount this statement as necessary.
If there is not interest in maintaining no-heap-allocs on the MbP side ...
For my part I think it's a useful enough goal. We just need to understand what's desired.
... but pretty much everything else in MbP -- query functions, slicing functions (i.e., get parts of an array), forward kinematics, inverse dynamics -- should be able to avoid heap allocations ...
That's at least a start for setting a goal here. I suppose the next actionable step would be for someone to try to turn that general idea into a list of specific function names to repair.
Here's an approximate tally of all functions in scope (based on the pybind coverage report).
11 drake/multibody/math/spatial_acceleration.h
11 drake/multibody/math/spatial_force.h
9 drake/multibody/math/spatial_momentum.h
30 drake/multibody/math/spatial_vector.h
11 drake/multibody/math/spatial_velocity.h
6 drake/multibody/plant/coulomb_friction.h
4 drake/multibody/plant/externally_applied_spatial_force.h
220 drake/multibody/plant/multibody_plant.h
14 drake/multibody/plant/propeller.h
10 drake/multibody/plant/wing.h
13 drake/multibody/tree/articulated_body_inertia.h
13 drake/multibody/tree/ball_rpy_joint.h
41 drake/multibody/tree/body.h
25 drake/multibody/tree/door_hinge.h
10 drake/multibody/tree/fixed_offset_frame.h
9 drake/multibody/tree/force_element.h
31 drake/multibody/tree/frame.h
2 drake/multibody/tree/frame_base.h
63 drake/multibody/tree/joint.h
21 drake/multibody/tree/joint_actuator.h
21 drake/multibody/tree/linear_bushing_roll_pitch_yaw.h
14 drake/multibody/tree/linear_spring_damper.h
19 drake/multibody/tree/planar_joint.h
22 drake/multibody/tree/prismatic_joint.h
23 drake/multibody/tree/revolute_joint.h
10 drake/multibody/tree/revolute_spring.h
29 drake/multibody/tree/rigid_body.h
40 drake/multibody/tree/rotational_inertia.h
18 drake/multibody/tree/screw_joint.h
23 drake/multibody/tree/spatial_inertia.h
11 drake/multibody/tree/uniform_gravity_field_element.h
30 drake/multibody/tree/unit_inertia.h
13 drake/multibody/tree/universal_joint.h
5 drake/multibody/tree/weld_joint.h
We need to cut down that list of ~800 functions into the subset which needs to be heapless. (Some of those 800 are probably obviously irrelevant, e.g., constructors and destructors, etc.)
If you want to propose a few dozen that are more important that the rest, we can look into fixing those first.
If you could give us a short list (say 3 to 5) methods you'd like us to work on first @edrumwri, that'd help a lot. We can re-evaluate after that.
Will do.
@edrumwri is this issue still relevant? My thought is to close it now and open more-specific heap issues as they crop up.
Still highly relevant. Our engineering cycles have been focused elsewhere, but I'll be able to get you a list of functions this quarter.
I think that our remaining heap allocations are limited to CalcInverseDynamics(). It looks like the heap allocations are due to some temporary storage in MultibodyTree::CalcInverseDynamics(), as well as it returning a VectorX, rather than allowing an output vector to be passed in.
Update on this: we hadn't exercised all code paths. We're also seeing heap allocations in some Jacobian functions as well.
The author (@mitiguy) is aware of two:
and:
We're seeing another here that is not just a heap allocation, but seemingly pretty inefficient as well (why wouldn't calculating the kinematic path to the world take constant time?): https://github.com/RobotLocomotion/drake/blob/b6d748509448ee842e05c0914e790fa0776015ae/multibody/tree/multibody_tree.cc#L2656
Another (seems pretty simple to address): https://github.com/RobotLocomotion/drake/blob/b6d748509448ee842e05c0914e790fa0776015ae/multibody/tree/multibody_tree.cc#L2166
We welcome any pull requests from users to fix these kind of problems that they care about. For now, the Drake team doesn't plan to make any changes ourselves here.
Heap allocations make computational run times nondeterministic and should be avoided, if possible to permit real-time applications. In some cases, the heap allocations identified below are likely reducing performance in non-real-time applications as well.
I expect this list will grow over time.