Reuse ContactSolverResults for performance

xuchenhan-tri commented 1 year ago

This issue is initially raised in the DrakeDeveloper Slack: https://drakedevelopers.slack.com/archives/C0JNB2E3S/p1699888429416319

To facilitate description and discussion of the issue, I will define a few notations to begin with. At a given time t at which a discrete update is performed, we use xᵈ⁻(t) to denote the pre-update discrete state value and xᵈ⁺(t) to denote the post-update value. In #19225, we designed a mechanism to emulate discrete input port sampling in a discrete MultibodyPlant by allowing sampling force input ports only during discrete updates. We use uᵈ⁻(t) to denote the value of the input port forces before sampling (i.e., the value from the sampling done in the previous step), and uᵈ⁺(t) to denote the value of the input port forces from sampling at time t. The value of uᵈ⁺(t) and xᵈ⁺(t) are held constant over the discrete update period, and at the beginning of the next discrete update, we relabel them to (xᵈ⁻(t+1), uᵈ⁻(t+1)) for the computation of the next update. Since #19225, we consistently sample the input ports before performing the discrete update. In other words, the computation of the next discrete state xᵈ⁺(t) is a function of (xᵈ⁻(t), uᵈ⁺(t)).

With these notations in mind, I'll now describe the issue: In the discrete update for MultibodyPlant, the computation bottleneck usually occurs at computing the contact solver results (SAP or TAMSI), so we would like to cache the results to prevent repeated computation. Ideally, we would like to only perform one contact solver computation per discrete step. However, as the slack threads referenced above points out, the current design precludes the caching the contact solver result to some extent. In particular, the contact solver computation are sometimes carried out twice in a single discrete step.

To see that, consider the following example: In the discrete update, the contact solver computation is carried out using (xᵈ⁻(t), uᵈ⁺(t)) which is followed by updating xᵈ⁻(t) to xᵈ⁺(t). Then, immediately after the discrete update, a user of MultibodyPlant pulls on its contact results output port (which depends on the contact solver results). Since the discrete state x_d has changed from xᵈ⁻(t) to xᵈ⁺(t) and the contact solver results depend on x_d, the cached contact solver results is invalidated and re-computation is triggered with (xᵈ⁺(t), uᵈ⁺(t)). Similarly, this cached contact solver result is invalidated again at the next discrete update because a new input port value will be sampled right before for the computation of the contact solver results for the next update.

Notice that in the example above, pulling on MultibodyPlant output ports depending on contact solver results repeatedly does not trigger additional contact solver computation because the cached result which is a function of (xᵈ⁺(t), uᵈ⁺(t)) remains valid. Therefore, in a discrete step, we perform at least one contact solver computation and at most two.

Here are a couple of alternatives I have considered to prevent the additional contact solver computation:

In the discrete update, instead of computing xᵈ⁺(t) as a function of (xᵈ⁻(t), uᵈ⁺(t)), compute xᵈ⁺(t) as a function of (xᵈ⁻(t), uᵈ⁻(t)). That is, use the input port force value from the previous time step for the discrete update. This prevents the addtional contact solver computation because if any output port depending the contact solver results has been pulled on since the last discrete update, a computation of contact solver results would have been carried out using (xᵈ⁻(t), uᵈ⁻(t)) (or equivalently (xᵈ⁺(t-1), uᵈ⁺(t-1))), and that cache entry value is still valid and can be reused.
With extreme care, manually manage the contact solver cache entry so that when an output port depending on it is pulled after the update from xᵈ⁻(t) to xᵈ⁺(t), reuse the contact solver result computed with (xᵈ⁻(t), uᵈ⁺(t)), instead of recomputing it with (xᵈ⁺(t), uᵈ⁺(t)).

Here are some of the implications for each and the trade-offs:

Option 1 is a breaking change. It offsets the sampling time of input port forces for MultibodyPlant by one step. An immediate consequence is that one cannot modify the input port value and advance a single discrete step and observe any immediate response as is done here for example. However, note that this is already the behavior for output ports, i.e., modifying input port forces values to MultibodyPlant will not immediately trigger any response in MultibodyPlant output ports, because input port forces are only sampled in a discrete update.

Option 2 is non-breaking. On the other hand, the implementation of option 2 is likely more brittle than 1. Many MultibodyPlant computations depends on contact solver results as well as other ingredients. For example, computing contact results depends on both the contact solver results and the geometry query results. If we were to freeze the computation of contact solver results to depend on (xᵈ⁻(t), uᵈ⁺(t)), we need to carefully resolve the discrepancy between it and the geometry query results that depend on xᵈ⁺(t) instead of xᵈ⁻(t). The same applies to any current and future MultibodyPlant cache entries and output ports that depend on contact solver results.

@Xining-Du (OP of the slack threads) @sherm1 @amcastro-tri @jwnimmer-tri @calderpg-tri @EricCousineau-TRI for feedback/thoughts.

sherm1 commented 1 year ago

Great writeup Xuchen -- very clear, thanks!

Note that we desperately need some feedback from roboticists to determine what's acceptable here. (Running at half speed doesn't seem acceptable.) Option1 is by far the easiest to implement. Is there an Option 3 we're missing?

amcastro-tri commented 1 year ago

Great, thanks Xuchen! There is a related analysis in this older issue, #12786.

Not sure if the full anlysis applies here, but at least it did remind me we do have notation for this! and I think, I am finally able to write this formally. We won't like the conclusion though: we not only need to ZOH the inputs, but also the outputs! Obviously I don't suggest we implement this, but I am hoping this will help come up with at least a better temporary solution.

So this is the notation. When I have a discrete variable, we'll notate it with x[n]. The typical example in #12786 is to write the discrete state update as x[n+1]= f(x[n], u[n]). Now, we have to be careful when we are inside a hybrid system. Within a simulation, Simulator provides a hybrid system framework and thus all systems become hybrid, or at least that's my conclusion. Lets accept that for a bit to see where it takes us. Then what that means, is that I have to think about the continuous time line, with discrete updates at specific times in it. For a system with discrete update period Td, I'll define t[n] = n⋅Td. Then, this is how I would write the most general system with a discrete update, input u(t) and output y(t). Notice that I purposedly wrote u(t) and y(t) here because this discrete system is embeded in a bigger hybrid framework, it must be able to understand continuous inputs and outputs, whether those come themselves from other discrete systems. I'll define the system in three parts:

State:

Composed of the tripplet {xᵈ[n+1], uᵈ[n], yᵈ[n+1]}. Notes:

I uᵈ[n] denotes the discrete state, while the previously introduced u(t) the continuous input port signal.They are different.
Notice I used uᵈ[n] instead of uᵈ[n+1]. That's preference. I shift the index so that in the discrete udpate below I can write the nice commonly used form xᵈ[n+1]= f(xᵈ[n], uᵈ[n]).

Discrete Update:

The discrete update at t = t[n], udpates the state from n to n+1 as:

uᵈ[n] = u(t[n]) xᵈ[n+1] = f(xᵈ[n], uᵈ[n]) yᵈ[n+1] = g(xᵈ[n], uᵈ[n])

Output:

Notice that since this discrete system is within a hybrid scope (at least when Simulator::AdvanceTo() happens), the output must be continuous. And it is defined as:

y(t) = yᵈ[n+1], ∀ t[n] ≤ t < t[n+1]

What this framework suggest is that for this system to behave in the way we want, we need to ZOH inputs AND outputs. AFAICT, there is no ambiguity in this notation. Once again, I do not know how I'd fix the problem from this issue today. But this seems to suggest that contact results should somehow be ZOH'ed.

Thoughts?

sherm1 commented 1 year ago

That all makes sense to me, Alejandro, as long as we're using a Simulator. We also allow users to evaluate output ports (y(t,x,u)) at any time, even with no Simulator present. If the corresponding cache entry is invalid, evaluating y requires that a computation be initiated. Currently that's what is causing the double evaluation -- the update of xᵈ to xᵈ[n+1] invalidates the contact results cache entry.

sherm1 commented 1 year ago

I think a good solution would have the flavor that evaluating an output port would retrieve SAP's most-recently-computed result if there was one, but perform a fresh computation if nothing good is available. And while we're dreaming, it would be good if an explicit change to an input port (e.g. FixValue()) is immediately reflected in the next output port evaluation or SAP step. Not sure yet how to achieve this.

amcastro-tri commented 1 year ago

Just to be clear, I didn't want to think about cache entries yet, but about the formal mathematical model, which seems to indicate we need extra state for the output. As you indicate, maybe some sort of cache entry would emulate that additional state but I also don't know right now how we'd implement it in a way that is both safe and performant.

jwnimmer-tri commented 8 months ago

We'll plan to meet f2f next week (Jeremy, Xuchen, Sherm) and hash out a plan for this ticket.

From the prior (winter) f2f, I think we all agree that we must add State to reflect the port sampling (with an opt-out for optimization-based workflows need to eschew the extra state). I anticipate our discussion will be first to confirm/explain that idea, and then to figure out the nuts & bolts of exactly the dynamics equations we want to embody (which timings happen when), and then an outline for how to implement it.

(We may decide that for deprecation reasons the extra state comes as opt-in at the start, defaulting to opt-out a few months later. That's a question for later. First we decide exactly how we want the code to work long term, then after that we figure out the transition plan.)

sherm1 commented 8 months ago

Re-reading Xuchen's and Alejandro's cogent analyses above, here's a sketch of a proposal. Some notation:

s(t) continuous value of some quantity s
s[n] sampled value

x = { xᵈ, xᶜ }  // state has discrete and continuous state variables (discrete includes abstract here)
u = { uᵈ, uᶜ }  // inputs include discrete and continuous input ports
y = { yᵈ, yᶜ }  // discrete and continuous output ports 

xᵈ[n] ← value  // notation for nᵗʰ sampling of something
context = { t, x(t), t[k], xᵈ[k], uᵈ[k] }   // augmented with last (kᵗʰ) sample

Say we are at the start of step n at time t. We have (in the Context) values for t and x(t). We can evaluate input ports u(t). We have space allocated in the Context for samples of t, xᵈ, and uᵈ. The first thing to do in a step is to fill in those samples:

t[n]  ← t      // remember step start time
xᵈ[n] ← xᵈ(t)  // remember the state at step start
uᵈ[n] ← uᵈ(t)  // sample the discrete inputs

Next, perform discrete and continuous state updates:

xᵈ(t) ≡ xᵈ[n+1] ← f(t[n], xᵈ[n], uᵈ[n])   // advance discrete state (doesn't overwrite xᵈ[n])
xᶜ(t+h) ← xᶜ(t) + ∫ẋᶜ                     // integrate continuous state
t       ← t + h                           // advance time

// Publish end of step outputs *on request only*. That is, someone pulls on an output port.
// These aren't sampled; just evaluated.
yᵈ[n+1] = gᵈ(t[n], xᵈ[n], uᵈ[n])  // discrete output depends on start-of-step samples
yᶜ(t)   = gᶜ(t, x, u(t))          // continuous output depends on current values

Some thoughts

this is a possibly breaking change since the contact results are now always beginning-of-step (or last-sampled) results. However, we could provide a continuous contact results port too that would always be "live".
sampling time t[n] can be done by the Simulator at step start
I'm not as sure about x and u samples (expensive). There can be many different periods in a Diagram and many steps before samples need to be taken. Probably best to have handlers sample the state & inputs they care about.
state "sampling" should ideally be done by swapping rather than copying
I'm not seeing a need to designate output ports as discrete -- they are discrete if they don't declare any dependency on continuous sources.

Implementation

API to mark input ports as "discrete"; causes allocation of sample space uᵈ[k] in Context
Context always allocates space for samples of time and discrete state variables xᵈ[k]
API for handlers to use to perform sampling of state & inputs (ideally swaps for state)
Additional DependencyTrackers for samples t[k], xᵈ[k], uᵈ[k]
Clean up multibody plant to use the new dependencies and APIs, remove cache hacks

Anyway, some fodder for us to discuss f2f. Your thoughts and counterproposals welcome of course!

jwnimmer-tri commented 8 months ago

After we agree on what semantics the MbP should provide on its ports / discrete steps, my inclination was to implement that inside MbP directly with our existing "declare state" and "declare discrete update", without any changes to the framework. I would hope that our existing framework tools are sufficient to express the dynamics we want to compute (e.g., by sampling inputs as part of the discrete update calculation).

sherm1 commented 8 months ago

The framework is sufficient to emulate this locally except for the state-swapping. I don't think that will be too expensive in MbP's case so we can likely just copy. The downside of the emulation (as we've discussed before) is that the MbP state now has extra un-interesting variables that could mess up Russ's analysis. In the framework version we can avoid that. But ... I think Russ said he can live with the mucked-up state; the framework version could be a TODO if that turns out to be ugly.

jwnimmer-tri commented 8 months ago

The sampled ports are state. They must be in State, even in the framework version. We must be able to re-calculate starting only from State, not State + extra hidden samples.

Russ agreed that adding State is OK, so long as we have an opt-out of the MbP sampling.

sherm1 commented 8 months ago

Samples are "state" in the technical sense of the word. But so are parameters and they aren't part of the State object. Samples are a special kind of state and we could choose to segregate them from other state variables if there was a benefit to doing so. (And I think there is.)

sherm1 commented 8 months ago

Ah, there is a long-standing framework TODO that would prevent emulating this in MbP -- see #21162. Fixing that is a yak shave for this fix.

jwnimmer-tri commented 8 months ago

When we talked a couple weeks ago, @xuchenhan-tri and @sherm1 were going to arm wrestle for who moves it forward, and I think the next step then was to make sure we're solid on the new calculation sequence we want.

I'm curious to hear what progress has happened here since we last spoke, what the current status is, and who won the arm-wrestling match?

sherm1 commented 7 months ago

This is still on my plate but hasn't advanced yet after we figured out what to do. I'm neutral about whether @xuchenhan-tri or I do the implementation.

sherm1 commented 6 months ago

FWIW while I'm on vacation: The instrumented branch that I've been using to investigate this is here. Mostly it dumps a line to cout whenever a cache entry is found to be out of date and is recalculated. The model I've been running is examples/hydroelastics/model_dales_problem. This is a hacked-up version of spatula_slip_control; looking at the dot diagram is worthwhile to see what's connected to what.

jwnimmer-tri commented 6 months ago

To help me follow along, I put together a call graph with some colors to indicate dependency ticket declarations.

call

call.dot

jwnimmer-tri commented 5 months ago

I'm going to use this comment to help me track my PR train. I'll be editing it over time with new data.

Here's the master branch:

[x] #21623

Here's the commits spun out of it:

[x] #21531
[x] #21533
[x] #21559
[x] #21563
[x] #21589
[x] #21590
[x] #21595
[x] #21603
[x] #21604
[ ] #21605
[x] #21606
[x] #21608
[x] #21609
[x] #21611
[x] #21618
[x] #21619
[x] #21625
[x] #21638
[x] #21639
[x] #21644
[x] #21664
[x] #21680
[x] #21683
[x] #21685
[x] #21688
[x] #21689
[x] #21692
[x] #21694
[ ] #21735
[ ] #21737

RobotLocomotion / drake