Async primary-secondary stepping

gazebosim / gz-sim

Open source robotics simulator. The latest version of Gazebo.

https://gazebosim.org

Apache License 2.0

669 stars 262 forks source link

Async primary-secondary stepping #468

Open nkoenig opened 3 years ago

nkoenig commented 3 years ago

Support asynchronous stepping between primary and secondary simulation instances.

mjcarroll commented 3 years ago

Currently, there is a lot of back-and-forth traffic between the simulation primary and the simulation secondary instances in order to keep them in complete lockstep.

The current mechanism works as follows (assuming network is configured and connected before main simulation starts):

On the primary, we set up all information necessary for the network step (distributing state to all the secondaries that require it, updating the clock, and setting affinities).
The primary sends this information via ign-transport to each secondary using the simulation step message. https://github.com/ignitionrobotics/ign-gazebo/blob/ign-gazebo4/src/msgs/simulation_step.proto
Each secondary performs their step based on the information received.
Each secondary sends the results of the step back to the primary.
Once the primary has received a response from each secondary, it will continue the next step.

With some of the other work that we are doing with performers, we should be able to relax this constraint.

The idea would be that if secondaries are simulating performers that are far apart in the world (so that they can't see or physically interact with each other), that the secondaries should be free to "run ahead" without having to worry about perfect lockstep.

chapulina commented 3 years ago

With some of the other work that we are doing with performers

Is that new work that's planned or things that are already in?

the secondaries should be free to "run ahead" without having to worry about perfect lockstep.

I think we may need a mechanism to rewind the simulation to a point where two performers meet. If one performer is running much faster than the other, it may pass through a level as if the other robot weren't there, but later we find out that they should have been there at the same time.

mjcarroll commented 3 years ago

Is that new work that's planned or things that are already in?

New work for me, while Ivan works on this.

I think we may need a mechanism to rewind the simulation to a point where two performers meet.

This is true. I 'm thinking that we also have an upper limit on how far it can "run ahead" (on the order of seconds)

ivanpauno commented 3 years ago

I've been reading a docs/code and running the distributed simulation examples, I have a bunch of basic questions (there might be "too basic", sorry, but I'm still trying to understand the issue):

distributing state to all the secondaries that require it, updating the clock, and setting affinities

Does all that happen in the NetworkManagerPrimary::Step method? What exactly is the "state" shared? Is it the UpdateInfo struct (i.e. the almost equivalent WorldStatistics message)?

I see there's also a SerializedStateMap, but that seems to be shared from secondaries to the primary (I didn't see that being shared between secondaries, or from the primary to secondaries).

I also didn't get what happens when a performer changes from level, is it affinity changed? If it changes from level (and secondary), how does the other secondary know the performer position, etc? (I didn't get how that's shared). What if the performer is in two levels? (can it be?)

I think we may need a mechanism to rewind the simulation to a point where two performers meet. If one performer is running much faster than the other, it may pass through a level as if the other robot weren't there, but later we find out that they should have been there at the same time.

:+1: To be able to rewind, what would need to be stored? would a map<SimulatedTimestampT, SerializedStateMap> work?

I 'm thinking that we also have an upper limit on how far it can "run ahead" (on the order of seconds)

:+1: to a limit Maybe when a performer is going to cross to another level and that level is being executed by another secondary, that secondary should be stopped (?).

chapulina commented 3 years ago

I've been reading a docs/code and running the distributed simulation examples

Great! In case you missed it, this tutorial has a lot of info: https://ignitionrobotics.org/api/gazebo/4.0/distributedsimulation.html .

Does all that happen in the NetworkManagerPrimary::Step method?

Yeah that drives the whole step cycle.

What exactly is the "state" shared?

That includes time and affinities, see the SimulationStep message.

I see there's also a SerializedStateMap, but that seems to be shared from secondaries to the primary

Exactly, that goes from secondaries to the primary, and the primary uses that information to reassign affinities and display the simulation to the GUI.

I also didn't get what happens when a performer changes from level, is it affinity changed?

Not necessarily. The affinities only need to be updated when performers from different secondaries end up in the same level. In this case, one of them will need to be transferred the other's secondary, so they can be simulated at the same time. The logic deciding the affinities is in NetworkManagerPrimary::PopulateAffinities, but it's incomplete, see https://github.com/ignitionrobotics/ign-gazebo/issues/93.

If it changes from level (and secondary), how does the other secondary know the performer position, etc? (I didn't get how that's shared).

The primary is responsible for reassigning these affinities and communicating the performer's state to the new secondary. At least that was the plan, I don't remember how much of that is already working.

What if the performer is in two levels? (can it be?)

Yes, a performer may be, and often is, in several levels at once. I believe that if a performer's affinity needs to be changed, all levels that it is in will need to come with it.

To be able to rewind, what would need to be stored? would a map<SimulatedTimestampT, SerializedStateMap> work?

Good question. If that's enough to transplant a performer across secondaries, it should be enough to rewind. We're still missing a generic rewind mechanism, see https://github.com/ignitionrobotics/ign-gazebo/issues/203. Once that's tackled it may help here.

Maybe when a performer is going to cross to another level and that level is being executed by another secondary, that secondary should be stopped (?).

Yeah, and maybe rewinded if it's too far ahead in time.

mjcarroll commented 3 years ago

Not necessarily. The affinities only need to be updated when performers from different secondaries end up in the same level. In this case, one of them will need to be transferred the other's secondary, so they can be simulated at the same time. The logic deciding the affinities is in NetworkManagerPrimary::PopulateAffinities, but it's incomplete, see #93.

One thing that Ian and I have been discussing is that (at least for SubT) that robot performers don't need to get their affinity changed, and that we should just collocate physics simulation for those performers. The reason for this is that many of the systems have internal state that isn't currently stored in the ECM or serializable, so transplanting them may be troublesome.

ivanpauno commented 3 years ago

Thanks for the answer @chapulina !! I think I get the issue better now.

I have some extra questions/comments:

Currently, there is a lot of back-and-forth traffic between the simulation primary and the simulation secondary instances in order to keep them in complete lockstep.

What is the overall goal? Is it to reduce network traffic or to avoid blocking secondaries? (or both?)

Currently the primary seems to be reconstructing the world state based on the secondaries SerializedStateMap. IIUC affinities are only being calculated based on the initial performer levels but aren't being updated (TODO here), though it seems that the plan was to update them in the primary.

Something I was thinking is that instead of the primary checking if affinities need to be updated, secondaries could notify that (i.e. performer is entering a new level or not). Then the primary only has to check those notifications, if no performer is changing level it can send a new step immediately (without any extra processing). If not, the primary has to recalculate performers affinities which implies processing the SerializedStateMap messages. In this way, there's still perfect lockstep (no secondary running ahead), but the primary is only checking the notifications and not rebuilding the map based on the secondaries map state (except when needed).

I guess it's desired that the primary has the complete map state step to step, so the SerializedStateMap messages are still needed but can be processed asynchronously.

Out of curiosity, how does distributed simulation work when you have plugins bridging with ros/etc? Is the primary stepping those plugins?

e.g.: if a secondary is stepping those plugins, rewinding doesn't seem like an option (e.g robot mapping ...) I imagine that would not be the case, because running those plugins in more than one secondary (performer moving between them) also seems like a problem.

It might be possible to allow secondaries to "run ahead" without needing to rewind, if there are some suppositions about "maximum performer speed", "maximum performer acceleration", etc (those could be global, or individual to the performer). Those calculations should be extremely conservative if there's no rewind mechanism.

ivanpauno commented 3 years ago

One thing that Ian and I have been discussing is that (at least for SubT) that robot performers don't need to get their affinity changed, and that we should just collocate physics simulation for those performers. The reason for this is that many of the systems have internal state that isn't currently stored in the ECM or serializable, so transplanting them may be troublesome.

I think this already answers one of my questions :smile:

mjcarroll commented 3 years ago

What is the overall goal? Is it to reduce network traffic or to avoid blocking secondaries? (or both?)

Both, but mostly the second. At least in my original testing, we weren't seeing much benefit to the distribution because we were spending so much time transacting state. Also, the blocking means that the whole simulation runs as slow as the slowest secondary. Some of this is unavoidable, due to lockstepping, but ideally we could "smooth out" some of the longer secondary steps?

Out of curiosity, how does distributed simulation work when you have plugins bridging with ros/etc?

Right, so it doesn't really in that case. Additionally, we don't have "ros-bridge" plugins in Ignition. Everything comes out on an ignition topic, and then that gets bridged over to ROS messages.

I suppose one alternative would be to keep the simulated state a few seconds ahead of the state that actually comes out on ignition transport? It would require more bookkeeping, but would allow for rewinding without any discontinuity in the output state.

ivanpauno commented 3 years ago

Thanks @mjcarroll !

I suppose one alternative would be to keep the simulated state a few seconds ahead of the state that actually comes out on ignition transport? It would require more bookkeeping, but would allow for rewinding without any discontinuity in the output state.

That sounds like a good idea.

I did some basic profiling of the network manager primary step function, it seems that stepping all primary systems is what's blocking the longest (that took two orders of magnitude longer than updating the map state).

Is it possible to already check if a performer is going to interact with another one without stepping all primary systems but directly after updating the primary map state with the secondaries? If that's possible, maybe the primary can already send a new step message to the secondaries while stepping its subsystems in parallel. It wouldn't be true "asynchronous" stepping, but maybe it avoids most of the blocking.

chapulina commented 3 years ago

that robot performers don't need to get their affinity changed, and that we should just collocate physics simulation for those performers.

Yeah I think we may need to do some tricks to collocate parts of the simulation, but not all of it. Rendering would also need to be collocated for sensors, right? And while part of the simulation, say physics and rendering, is being simulated on one secondary, I think we can't really escape from lock-stepping with other secondaries that are running controllers that rely on that data, right?

Then the primary only has to check those notifications, if no performer is changing level it can send a new step immediately

Yeah that sounds like it could really speed things up. The primary is currently responsible for consolidating all state and reporting it to the GUI. We would need an alternative mechanism for that. I think the GUI could handle combining state received directly from all secondaries, for example, .

ivanpauno commented 3 years ago

I suppose one alternative would be to keep the simulated state a few seconds ahead of the state that actually comes out on ignition transport? It would require more bookkeeping, but would allow for rewinding without any discontinuity in the output state.

@mjcarroll some extra questions about this:

Who is currently publishing the state through ignition transport, both primaries and secondaries or only the primary? What would be needed to let the simulation "run ahead" of the state coming out on ignition transport?

mjcarroll commented 3 years ago

Who is currently publishing the state through ignition transport, both primaries and secondaries or only the primary?

State sync is going both directions in this case. The primary aggregates the state from all of the secondaries, and then redistributes it. This is so that each secondary ultimately knows the location of all entities.

What would be needed to let the simulation "run ahead" of the state coming out on ignition transport?

I'm starting with the assumption that each secondary is responsible for one performer (or potentially more performers, but stick to one for this). That secondary is responsible for all of the systems attached to that performer. In steady state, this would include running the three update states (PreUpdate for control inputs/constraints, Update for physics, and PostUpdate for sensor updates).

I think we could let each secondary "run ahead" (or maybe "pre-simulate") for a certain amount of time, by running all iterations that would only have an Update phase. In most cases, the PreUpdate and PostUpdate shouldn't be running every iteration, because it's dependent on things like sensor update rates and control update rates.

We could let it go further by extrapolating into the future, but I think this would end up with more "rewinds" than is desirable? If we choose to run further, I believe we would also have to disable any PostUpdate phase, because we wouldn't want to send sensor data to subscribers until we have decided that the evaluated physics is actually the final correct version.

ivanpauno commented 3 years ago

I have done two experiments, here some notes about them:

Sending a new step request to secondaries while asynchronously doing the primary step, i.e.: doing this asynchronously, and wait until its completion after sending a new step message to secondaries (something like: request secondaries step and wait responses, consolidate state, async start the primary step, request new secondaries step, wait for primary step completion, wait for new step acks from secondaries, ...). Results: Minimal speedup in RTF, nothing fancy (from 30%/40% to 40%/50%, standalone is running ~900%).
Let secondaries go ahead (a fixed number of iterations, say 10000) and cache the results, but still only send the results to the primary when a step is requested. If affinities changed, cleanup the cached results and rewind (I haven't tested this, I've only used a trivial example where performers never interact). Results: No speedup. The secondaries of the distributed_steps example can actually complete a step really fast, so having results cached but still waiting to a "step" message from the primary doesn't seem to be a win.

(I didn't push the code of the experiments because they are quite a hack :smiley:, but I can clean them up and push if needed)

My next experiment is going to be: Let secondaries "go ahead" and send the results to the primary immediately. The primary aggregates results and after a number of aggregated iterations it sends an "ack" to secondaries, so secondaries can continue moving forward and can also clean the history of cached iterations that the primary already acknowledged. If the primary detects that some of the performers changed affinity, or the simulation is paused, etc; it sends an "step" message to secondaries informing they have to rewind/pause/etc. (the "ack" and "step" message the primary sends will actually be the same message but with different content)

@mjcarroll does that idea sound reasonable? I still have to figure out some details of this last approach, but it sounds like a reasonable next experiment to me. If not, we probably need a meeting to discuss the details of how we want to move forward.

That secondary is responsible for all of the systems attached to that performer. In steady state, this would include running the three update states (PreUpdate for control inputs/constraints, Update for physics, and PostUpdate for sensor updates).

I will assume for the moment that secondaries are running systems which state is completely serializable (IIUC they should only have an Update phase, PreUpdate and PostUpdate are doing nothing).

I'm not sure how systems that have non-serializable state can be distributed, I see two options:

All systems with non-serializable state are run in the primary after consolidating the steps from secondaries.
They are run in the secondary assigned to that performer, but can't move ahead, i.e. only the physics can "move ahead" and be cached, the other parts wait until a "step" message is sent by the primary.

I imagine that the second approach will slow down the simulation a lot, because it's basically using the current lockstep approach.

ivanpauno commented 3 years ago

To summarize the progress made:

https://github.com/ignitionrobotics/ign-gazebo/pull/486 improves distributed simulation performance by avoiding to publish full scenes in the scene broadcaster.
https://github.com/ignitionrobotics/ign-gazebo/pull/481 implements a basic asynchronous primary-secondary mechanism. It also adds some scripts that allow to easily run distributed simulation with N robots in M secondaries, collect stats (it creates a RTF histogram), and visualize the stats later. Instructions to run the scripts in this comment.

There was a big performance improvement compared with the previous distributed simulation implementation when running in a single machine, but it's still slower than the standalone simulation (which will probably always be the case when using a single machine), and I didn't test in multiple machines.

The next step would be to test the scripts running secondaries in multiple machines, to see if for a reasonable number of robots there's an advantage of distributing the simulation.