RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.29k stars 1.26k forks source link

Stateless SceneGraph and cache-friendly layout #9501

Closed SeanCurtis-TRI closed 3 years ago

SeanCurtis-TRI commented 6 years ago

Description

SceneGraph currently has a single abstract state value. That value is an aggregate blob of data including the description of the scene graph and the various implementations of that scene graph in engines. The aggregate structure exists for various historical reasons:

  1. The topology of scene graph was intended to be changeable (in support of automotive). As such, the contents of the world would also be part of state, and
  2. It was written pre-caching so there were some dirty hacks included to enable functionality without exerting any effort to create a caching alternative.

However, there are applications (designing a controller) for which the presence of abstract state can be problematic. Furthermore, the ability to add/remove geometry during the course of a simulation is a speculative features -- one that is not even supported by the system framework. Therefore, we are paying a cost for a feature that we don't really have (and, for most early uses, don't need).

Proposal

When scrubbing the SceneGraph/GeometryState classes to make them cache compliant, remove the support for dynamic graph population. It is anticipated that this would reduce SceneGraph into a simpler system comprising of inputs and parameters and no internal state.

Things to pay attention to

  1. Make sure we don't make choices that preclude having dynamic geometry in the future.
  2. Consider use cases where people would like to exercise geometry operations outside of the system framework (currently, that can be achieved simply by instantiating and exercising the GeometryState -- although that should not be considered a long-term solution.

/cc @RussTedrake

jwnimmer-tri commented 6 years ago

For now, an easy change I think would be to declare abstract GeometryState to be either (1) an abstract cache entry value or (2) an abstract state value, depending on whether the user asks for the geometries to be dynamic. I think this is likely a very small tweak to the current code, if we want to do it quickly.

SeanCurtis-TRI commented 6 years ago

An excellent suggestion, @jwnimmer-tri.

sherm1 commented 6 years ago

I'm not sure it is reasonable to require SceneGraph to be stateless. Some necessary geometric operations require state, such as detecting pass-through. I don't think the existence of such state has to have any impact or visibility to downstream users of SG -- simply ignoring that would be the right thing to do.

sherm1 commented 6 years ago

Regarding cache use: please remember that cache is always a performance optimization, never memory. An easy test is that I can destroy the cache any time with no effect on results other than computation time.

SeanCurtis-TRI commented 6 years ago

When you say "destroy" the cache, could you elaborate?

I see two issues:

  1. The cache can be destroyed (and re-allocated). We've obviously got systems that will have values that exclusively live in cache. So, you can't mean making the caching code disappear. It seems as long as that cache entry always allocated from a SceneGraph-defined model, then that resolves this issue.
  2. The other aspect is turning caching off -- and certainly, it would work as well.

The other issue of eventually wanting other state that isn't purely continuous state is definitely a bigger issue.

sherm1 commented 6 years ago

By "destroy" the cache I mean wipe out any values that it contains. The const System plus the non-cache Context contents (time, parameters, state, input port values) should always be sufficient to recover the wiped-out value.

RussTedrake commented 6 years ago

@sherm1 - simply ignoring state is almost never the right thing to do!!

If the state effects dynamics/output, then algorithms should be able to (expected to?) reason about that state. If the state does not effect output, then it’s not state.

I agree that some contact models have state. Those should be declared as such, and algorithms should reason about the implications. If the implications are small, then a formal notion of approximation might be made (eg via doing design on a simpler, stateless version of the model that has constraints avoiding poorly modeled situations).

sherm1 commented 6 years ago

If you think of the abstract state as a kind of generalized "mode" (in the hybrid dynamics sense), aren't there cases where you would want to generate a controller that operates just on the current mode? I'm imagining that LQR wouldn't be capable of reasoning about modes but could generate a useful controller holding the mode constant.

How would you think about state that is used solely to determine whether a time step is acceptable (because of an unfortunate relationship between the start-of-step state and end-of-step state, such as a pass-through)? That would be meaningful during time-advancing simulation but would likely be ignorable for any instantaneous computation.

RussTedrake commented 6 years ago

Yes -- if there is a mode (in the hybrid dynamics sense), then we have e.g. HybridLQR that tries to do the right thing -- if only to perform LQR and guarantee performance in only one mode. But calling LQR on a hybrid system without telling it about the mode is not the recipe.

For your pass-through example... my question would be whether the mathematical model requires the state or whether it's just the numerical implementation for simulation. If the model requires it, then it's probably just as relevant for control as for simulation.

avalenzu commented 5 years ago

Just ran into this issue while trying use MbP + SG with systems::trajectory_optimization::DirectCollocation. I don't need to modify any geometry during the optimization, so the approach of having GeometryState be an abstract cache entry would be perfect for me here.

RussTedrake commented 4 years ago

Bringing this up again, as it continues to be a pain point for me. Here is another use case that I feel blocked on: I've got a simple plant with continuous state + scenegraph for visualization. I've run a simulation, logging the data (via SignalLogger or via DenseOutput), and now I want to effectively rewind the simulation to inspect an output at a previous time. This should be easy to do, as the only real state in the system is my continuous state. But can I simply call context.SetContinuousState() and be satisfied that "ignoring" the geometry state is an ok thing to do? I don't think there is anything in my contract with the Context that would suggest that's ok.

There are loads of things like this that we should be able to take advantage of by declaring everything in the context. eventually, we should log abstract state, too. But it's frustrating to have to add that complexity, and/or wait for the feature, to do something that should be simple!

sherm1 commented 4 years ago

If GeometryState isn't changing during the simulation, then just resetting the states that changed would be fine. We could make that rigorously testable by adding a serial number that gets bumped every time GeometryState (or more generally any abstract state) has its value changed. (We already have serial numbers for fixed input port values and cache entries.)

It doesn't make logical sense to store the geometry state only in a cache entry, regardless of whether it is changeable. (A cache entry must be recreate-able from state at any time.) Rather, if it really isn't going to change it should be part of the SceneGraph construction, i.e. stored in a SceneGraph data member that can't be changed during a simulation. An intermediate position would be to put the GeometryState in an abstract Parameter.

Logging the whole Context at each trajectory step would be the nicest solution IMO. That would be an ideal use case for copy-on-write so there would only be one copy of the gigantic GeometryState unless it changed.

RussTedrake commented 4 years ago

I am working towards logging the whole Context at each trajectory (with the momentum I've got from cleaning up DenseOutput). But this sentence offends my basic principles: "If GeometryState isn't changing during the simulation, then just resetting the states that changed would be fine." Of course that's true, but methods I write at the System level can't know that. If GeometryState isn't changing, then it should not be registered as a state parameter. If something is registered as state, then I have to assume that the System could be changing it at any time during a Simulation.

I think it could be a parameter, if we expect people to want to read/write from it from outside the SceneGraph. (I can't see that use case, but maybe you can?) Otherwise, why not just have it as a member variable of SceneGraph? Systems are const during simulation.

sherm1 commented 4 years ago

I agree that if Geometry "State" really can't change, then an immutable member variable is the perfect place for it. However if it makes sense to have it changeable, it is still possible to keep track of whether it has changed, and if not take advantage of that case (or write code that insists that no change has occurred).

There is some precedent for that: Discrete states (including abstract ones) can only change at isolated events. If you can prove that no event has occurred you can count on the discrete variables being unchanged. Integrators depend on that during a step and confidently mangle only the continuous variables. That's not a violation of any fundamental properties of state, but it does depend on ironclad knowledge that no discrete update can occur.

RussTedrake commented 4 years ago

I agree. Of course adding things that do not change into the state vector is inefficient (and many algorithms scale in performance with the number of state variables). But more importantly, we have many algorithms that are very clearly defined when there is vector-valued state, but don't know what to do about abstract state. Although I admit that we could write a check that every algorithm written for the vector-valued state must call 'bool DeclaredAbstractStateWillNeverChange()', it seems wrong to do so?

sherm1 commented 4 years ago

Maybe that could be justified for a particular application but I agree the right way to do this is to have a way to log the complete Context at every step. I think the only practical way to do that is to have a way to share the parts of the Context that don't change. CoW is my favorite way to do that because it preserves the semantics of copying while secretly optimizing invisibly.

In this particular case I guess the proposal is to stop allowing for geometry to change during a simulation. However, I thought that was a relatively high priority request even recently.

RussTedrake commented 4 years ago

I would be fine to have SceneGraph use state if/when it is dynamic over a simulation. But would like that it does not have state when everything is static.

SeanCurtis-TRI commented 4 years ago

Seems reasonable that the big blob of data that is currently abstract state could be optionally configured as state or parameter. And, down the road, I can even imagine aspects of the blob being optionally state vs parameter. I'm going to ruminate (and probably tinker) on this.

SeanCurtis-TRI commented 4 years ago

@RussTedrake on the subject of:

But can I simply call context.SetContinuousState() and be satisfied that "ignoring" the geometry state is an ok thing to do?

Presumably, geometry would ignore this for you, properly and automatically. As SceneGraph has no continuous state. Certainly, when you accumulate a diagram's continuous state, that state vector is unchanged whether I add a SceneGraph to the diagram or not. Or am I missing something?

RussTedrake commented 4 years ago

My point is that if I have a context with both continuous state and abstract state, then I cannot set only the continuous state and be assured of proper behavior. It would be analogous to me setting only half of the continuous state and hoping that the other half was not important.

And I currently have no way of setting the abstract geometry state (since I don’t have a way to record it) to what I know it should have been at say t=5.3, if I wanted to rewind and inspect my system from there.

Sorry if I’m being unclear here. Does that make sense?

sherm1 commented 4 years ago

That part is clear -- you want to be sure that the full state x=[xc xd xa] has been restored to the value it had before, yet you have recorded only xc and are worried that the other parts have changed. For dealing with GeometryState in particular (currently in xa) there are three options discussed above:

What's not clear to me is whether you are looking for a general solution or just need to deal with GeometryState for some particular purpose. None of the above provides a general solution since there are other contents of xa that could conceivably have changed also. If your purpose is narrow, any of the above three options can be made to work. If your purpose is more general, none of them are fully adequate.

RussTedrake commented 4 years ago

I think my purpose is quite general, but the rules/philosophy of Context as I interpret them already provide the rules that I need. GeometryState is the only(?) current implementation in our mainstream framework which we effectively force on people which violates my principles.

I know that where we disagree is around the notion of state. Context is time, state, and parameters. I know we agree that Context must contain everything one needs in order to start/continue/rewind a simulation. As you say, state is allowed to change with time, and parameters are not. Where we differ, though, is that I want to keep state extremely clean, as dealing with this state represents the core challenge to almost any design or analysis algorithm beyond forward simulation (even log replay!). So I have a very general philosophy, (albeit defined imprecisely): thou shall not pollute your state with things that complicate design/analysis. In contrast, I'm pretty sure that you are quite happy to use state as a scratchpad where one can jot down anything useful for a system during simulation. Abstract state, in particular, significantly complicates design/analysis, because methods written against the System interface rarely have the means to reasoning about them.

I've been working so far with the pattern that if I want something clean to design/analyze, then I maintain two Diagrams: one that has my e.g. plant + controller, and another that has those + SceneGraph + visualization for when I want to visualize the output. I'm not completely against that workflow -- it's reasonable that adding a visualizer to the system adds complexity that defies analysis. But we've apart from the discussion in this issue, we've actually designed everything cleanly enough that this separation is not strictly necessary -- which is awesome. So a narrow solution here adds a LOT of value.

And as I'm trying to push further -- like having a simple gui which scrubs back and forth across time to replay the results of a simulation -- we can enable that feature for many systems now, without waiting for a feature to record the entire Context, pending only this issue. Again, a narrow solution here adds a LOT of value.

I agree that the general resolution is to record the Context.

sherm1 commented 4 years ago

Gotcha -- thanks for that clarification, Russ. Ideally you'd like to analyze systems that don't have abstract state, and GeometryState is the only thing gumming up the works at the moment. Moving that seems like a workable short term fix. I think we should also invent a rigorous way to ensure that a System that has abstract state has not changed any of those values over a trajectory of interest.

RussTedrake commented 4 years ago

Just N.B. — I’m seeing students run into this in their class projects. They call a method meant for systems with only continuous states and are surprised to find that SG adds an abstract state. I will start PRing some workarounds.

RussTedrake commented 3 years ago

Hoorah!!

SeanCurtis-TRI commented 3 years ago

Note follow upp PR #14245 -- it explicitly deprecated the old behavior and changed the default behavior to be Parameter.