RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.36k stars 1.27k forks source link

Undesirable heap allocations in MultibodyPlant #13186

Closed edrumwri closed 8 months ago

edrumwri commented 4 years ago

Heap allocations make computational run times nondeterministic and should be avoided, if possible to permit real-time applications. In some cases, the heap allocations identified below are likely reducing performance in non-real-time applications as well.

I expect this list will grow over time.

sherm1 commented 4 years ago

Thanks for pointing these out, @edrumwri! Some questions:

jwnimmer-tri commented 4 years ago

The colorful lines of code (tree functions) are a distraction. Look at the method names after the bullet points (plant functions). Those plant function do not offer user-supplied storage, and the entry points that we actually care about. (In particular, GetKinematicPathToWorld is called internally by MbT, not by the user.)

edrumwri commented 4 years ago

Per discussion with @jwnimmer-tri over slack, the key point that I neglected to mention is the typical workflow for real-time programming where one first goes through a "warm up" period where heap can be allocated at will. Afterwards, a real-time phase is entered where no further heap allocations are allowed.

The caching system naturally supports this workflow systems if the cache allocates the memory once and the cache evaluation function doesn't allocate any new memory (@ggould-tri has pointed out concerns with using the caching system in a real-time context; we're still proceeding under the assumption that we can use caching, carefully, to avoid unwanted heap allocations).

* the other two seem like avoidable allocations, though possibly requiring an API change to pass in some necessary temp space. For prioritizing, have you been able to measure an effect from the heap allocation or is the cost speculative at this point?

I don't think API changes are necessary as long as the caching system is used for maintaining those std::vector objects. In any case, our primary concern isn't cost, but deterministic computation times, and memory allocation (among other OS services) is typically not deterministic.

sherm1 commented 4 years ago

Ah, yes for the first case the guilty code is here, where that output argument is allocated.

sherm1 commented 4 years ago

cc @mitiguy

mitiguy commented 4 years ago

Issue #13186 was filed on MBP doing "undesirable heap allocation" by @edrumwri who was also the requester of issue #12592 for the new moving-frame bias acceleration feature (related issue #13453). Heap allocation in CalcBiasSpatialAcceleration() caused concern -- is the new method useful for Evan's application without caching? From email correspondence with Evan (June 16, 2020): "I can wait for the heap allocation to be eliminated at a later date. We're unable to use Drake in a real-time application until all of the heap allocations in MBP have been eliminated: adding a new heap allocation does not make the problem any better, but neither does it make it considerably worse."

sherm1 commented 4 years ago

but neither does it make it considerably worse

Not sure I agree with that -- the changes in #13453 double the number of heap allocations in MBP.

sherm1 commented 4 years ago

Turns out there are many other heap allocations in the MultibodyPlant computations. We are tracking those now with the instrumented cassie_benchmark and work is in progress to remove them.

jwnimmer-tri commented 2 years ago

This issue does not seem specific enough to be actionable. There are hundreds of public functions on MbP (and more if you count the supporting public classes like Body, Joint, etc.). What list of functions in particular need to be heap-sensitive? Without a list of functions to tackle (i.e., to write LimitMalloc test cases for), there is no way to move forward here and we should close out the issue.

Is the implication of Sherm's comment above that the only relevant functions are the non-autodiff cassie tests? Which is to say, fixing the 3 heap ops in DoubleInverseDynamics and 22 heap ops in ForwardDynamics is sufficient to close this issue?

sherm1 commented 2 years ago

WDYT @edrumwri ?

edrumwri commented 2 years ago

I think we can discount the parts of the forward dynamics stepping scheme- there is little hope that methods that compute contact forces can be made real-time safe, at least in general, so it doesn't make sense to chase down heap allocs there. But pretty much everything else in MBP*- query functions, slicing functions (i.e., get parts of an array), forward kinematics, inverse dynamics- should be able to avoid heap allocations / be made to be real time safe.

I've viewed removing heap allocs as a win for both real time safety and performance. If there is not interest in maintaining no-heap-allocs on the MBP side (we've been blessed with few heap allocs in the systems framework), we can always move the MBP accesses to a non-real time thread on our end. But it would be a shame to have to do so.

* I haven't been paying attention to the deformable body stuff that I know is coming/already here, so discount this statement as necessary.

jwnimmer-tri commented 2 years ago

If there is not interest in maintaining no-heap-allocs on the MbP side ...

For my part I think it's a useful enough goal. We just need to understand what's desired.

... but pretty much everything else in MbP -- query functions, slicing functions (i.e., get parts of an array), forward kinematics, inverse dynamics -- should be able to avoid heap allocations ...

That's at least a start for setting a goal here. I suppose the next actionable step would be for someone to try to turn that general idea into a list of specific function names to repair.

Here's an approximate tally of all functions in scope (based on the pybind coverage report).

 11  drake/multibody/math/spatial_acceleration.h
 11  drake/multibody/math/spatial_force.h
  9  drake/multibody/math/spatial_momentum.h
 30  drake/multibody/math/spatial_vector.h
 11  drake/multibody/math/spatial_velocity.h
  6  drake/multibody/plant/coulomb_friction.h
  4  drake/multibody/plant/externally_applied_spatial_force.h
220  drake/multibody/plant/multibody_plant.h
 14  drake/multibody/plant/propeller.h
 10  drake/multibody/plant/wing.h
 13  drake/multibody/tree/articulated_body_inertia.h
 13  drake/multibody/tree/ball_rpy_joint.h
 41  drake/multibody/tree/body.h
 25  drake/multibody/tree/door_hinge.h
 10  drake/multibody/tree/fixed_offset_frame.h
  9  drake/multibody/tree/force_element.h
 31  drake/multibody/tree/frame.h
  2  drake/multibody/tree/frame_base.h
 63  drake/multibody/tree/joint.h
 21  drake/multibody/tree/joint_actuator.h
 21  drake/multibody/tree/linear_bushing_roll_pitch_yaw.h
 14  drake/multibody/tree/linear_spring_damper.h
 19  drake/multibody/tree/planar_joint.h
 22  drake/multibody/tree/prismatic_joint.h
 23  drake/multibody/tree/revolute_joint.h
 10  drake/multibody/tree/revolute_spring.h
 29  drake/multibody/tree/rigid_body.h
 40  drake/multibody/tree/rotational_inertia.h
 18  drake/multibody/tree/screw_joint.h
 23  drake/multibody/tree/spatial_inertia.h
 11  drake/multibody/tree/uniform_gravity_field_element.h
 30  drake/multibody/tree/unit_inertia.h
 13  drake/multibody/tree/universal_joint.h
  5  drake/multibody/tree/weld_joint.h

We need to cut down that list of ~800 functions into the subset which needs to be heapless. (Some of those 800 are probably obviously irrelevant, e.g., constructors and destructors, etc.)

If you want to propose a few dozen that are more important that the rest, we can look into fixing those first.

amcastro-tri commented 2 years ago

If you could give us a short list (say 3 to 5) methods you'd like us to work on first @edrumwri, that'd help a lot. We can re-evaluate after that.

edrumwri commented 2 years ago

Will do.

sherm1 commented 2 years ago

@edrumwri is this issue still relevant? My thought is to close it now and open more-specific heap issues as they crop up.

edrumwri commented 2 years ago

Still highly relevant. Our engineering cycles have been focused elsewhere, but I'll be able to get you a list of functions this quarter.

edrumwri commented 2 years ago

I think that our remaining heap allocations are limited to CalcInverseDynamics(). It looks like the heap allocations are due to some temporary storage in MultibodyTree::CalcInverseDynamics(), as well as it returning a VectorX, rather than allowing an output vector to be passed in.

edrumwri commented 1 year ago

Update on this: we hadn't exercised all code paths. We're also seeing heap allocations in some Jacobian functions as well.

The author (@mitiguy) is aware of two:

https://github.com/RobotLocomotion/drake/blob/b6d748509448ee842e05c0914e790fa0776015ae/multibody/tree/multibody_tree.cc#L2497

and:

https://github.com/RobotLocomotion/drake/blob/b6d748509448ee842e05c0914e790fa0776015ae/multibody/tree/multibody_tree.cc#L2183

We're seeing another here that is not just a heap allocation, but seemingly pretty inefficient as well (why wouldn't calculating the kinematic path to the world take constant time?): https://github.com/RobotLocomotion/drake/blob/b6d748509448ee842e05c0914e790fa0776015ae/multibody/tree/multibody_tree.cc#L2656

Another (seems pretty simple to address): https://github.com/RobotLocomotion/drake/blob/b6d748509448ee842e05c0914e790fa0776015ae/multibody/tree/multibody_tree.cc#L2166

Another: https://github.com/RobotLocomotion/drake/blob/b6d748509448ee842e05c0914e790fa0776015ae/multibody/tree/multibody_tree.cc#L2599

Last one: https://github.com/RobotLocomotion/drake/blob/b6d748509448ee842e05c0914e790fa0776015ae/multibody/tree/multibody_tree.cc#L1737

jwnimmer-tri commented 8 months ago

We welcome any pull requests from users to fix these kind of problems that they care about. For now, the Drake team doesn't plan to make any changes ourselves here.