RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.33k stars 1.26k forks source link

TAMSI results cache hack doesn't work correctly when caching disabled #12643

Closed sherm1 closed 2 days ago

sherm1 commented 4 years ago

@siyuanfeng-tri ran into a problem in a complicated Diagram where the behavior differed with caching enabled or disabled. I was able to trace this to the TAMSI results cache -- if I disabled caching for just that cache entry, leaving all other caching enabled, everything worked fine. In fact, I could re-enable even that cache entry as long as I did it after an AdvanceTo(0), which serves only to process the time-0 unrestricted and discrete updates.

There is a known-hack for this particular cache entry, well-documented in the code here. TL;DR: the cache entry declares dependency only on discrete state xd but in fact depends on much more.

Tracing in the debugger and discussing with @amcastro-tri , this is what is happening:

This violates the caching Prime Directive -- the answers should be unchanged regardless of whether caching is enabled.

sherm1 commented 4 years ago

After much discussion, Alejandro and I came to the conclusion that the current behavior with caching ON is correct, but with caching OFF the behavior is wrong (though in this case it seemed better). There are two independent problems here:

To say this another way, when running MBPlant in discrete mode we want its outputs y to be discrete with governing equations

y[n+1] = y(t[n], x[n], u[n])

which is what we get currently with caching on. (Note that in Siyuan's example u[0]=NaN, hence the unwanted behavior.) But with caching off, we are getting

y[n+1] = y(t, x[n], u(t))

which is an undesirable hybrid (mixed discrete/continuous) output. (In the example u(t) had been fixed up to be non-NaN so the behavior appeared to be better. )

We are debating how to address this and welcome any ideas. The most obvious fix is that MBPlant in discrete mode should sample its inputs into internal discrete variables, then have the TAMSI calculation depend on those variables rather than directly on the input ports. That would have the same effect as adding a zero order hold in front of the inputs. That would produce the desirable effect that the system would give the same results with caching on or off, which seems like an important invariant.

Thoughts?

sherm1 commented 4 years ago

@siyuanfeng-tri here is the fix I suggest:

MotionPrimitive::MotionPrimitive(const multibody::MultibodyPlant<double>* plant)
    : plant_(*plant) {

  // These are being directly copied to output ports.
  x_command_index_ = this->DeclareDiscreteState(
      systems::BasicVector<double>(plant->num_multibody_states()));
  motion_summary_index_ = this->DeclareAbstractState(
      AbstractValue::Make<MotionSummary>(MotionSummary()));

  //     ... declare ports ...

  // Ensure above state variables are valid at the start of the
  // first step (i.e., after Simulator::Initialize()).
  this->DeclareInitializationUnrestrictedUpdateEvent(
      &MotionPrimitive::UpdateEverything);

  // Then update these variables periodically.
  this->DeclarePeriodicUnrestrictedUpdateEvent(1e-3, 0,
      &MotionPrimitive::UpdateEverything);
}

systems::EventStatus MotionPrimitive::UpdateEverything(
    const systems::Context<double>& context,
    systems::State<double>* state) const {
  UpdateLogic(context, state);
  UpdateCommand(context, state);
  UpdateMotionSummary(context, state);
  drake::log()->info("MotionPrimitive::UpdateEverything");
  return systems::EventStatus::Succeeded();
}

Note: in addition to the initialization event (which was the fix), I switched from the Old School "override the dispatcher" style to what all the Cool Kids do now -- declare their own event handlers. Here UpdateEverything() is just a private method whereas the DoUnrestrictedUpdate() dispatcher was an overridden virtual.

sherm1 commented 4 years ago

I edited the title and description to reflect the investigation. This example revealed several independent issues that need to be addressed:

I'll follow up on these.

siyuanfeng-tri commented 4 years ago

Thanks Sherm! I should have remembered the initialization events.. I probably wrote those, or at least pushed really hard for having those in the early days.... I thought MBP's mode (discrete / continuous) is independent of what else in in the diagram, is this not true?? Is the discrete mode you are talking about in the second post coming from TAMSI hack? I am a bit confused here.

amcastro-tri commented 4 years ago

You are right @siyuanfeng-tri, whether MBP is discrete/continuous is given at construction, independent of anything else in the parent diagram. As @sherm1 mentions above, what we'd like to model is y[n+1] = y(t[n], x[n], u[n]). Notice the very important subtlety here, the discrete input port (not continuous!) u[n]. We didn't understand this very well before, and when I implemented the contact results output port in discrete mode I "emulated" the discrete behavior of u[n] entirely motivated on a run-time performance fix. See more details in the in source comment here. My "hack" was to make the contact results port (which in turn yes, it depends on the TAMSI cache entry) dependent on the discrete state xd. Therefore this discrete input u[n] "emulation" only works with cache turned on. Let me see if I can say it another way (sry, this is difficult to explain), the TAMSI cache entry with cache turned on, emulates a new discrete state (as if instead of having a ZOH for u[n], which would entail a new discrete state for u, we have a new state for the TAMSI computation). When caching is off, this "emulated" TAMSI state is gone and now instead you get the new (hybrid) system y[n+1] = y(t, x[n], u(t)) (notice t, u(t) instead of t[n], u[n]!).

Hopefully that helps?

amcastro-tri commented 4 years ago

I realize that the comment in the "hack" should instead say S = S(t[n], x[n], u[n]). We do have x[n], our MBP discrete state. We'd need a new state for u[n]. I believe that should solve the problem on the MBP side.

siyuanfeng-tri commented 4 years ago

makes sense.

jwnimmer-tri commented 2 days ago

Fixed in #21623, as I understand it.