SpiNNakerManchester / SpiNNFrontEndCommon

Common support code for user-facing front end systems.
Apache License 2.0
12 stars 11 forks source link

state machine and tie in to executable types for simulation based models. #358

Open alan-stokes opened 5 years ago

alan-stokes commented 5 years ago

this needs to be cleaned up,a s master currently ignores it for quite a bit of simulation based models.

rowleya commented 5 years ago

To give more detail - all executable types support:

  1. The state they should be in after loading their variables before the simulation starts
  2. The state they should be in after the simulation has finished and and data (recorded and provenance) is ready to be read

In addition to this, Simulation executable types support additional phases:

  1. Run time has been updated and the binary is ready for the next run. This appears to currently be supported through a response to an SDP packet and therefore might not need a new state.
  2. The binary has received the command to write provenance and exit. This supports the "run forever" mode of operation, followed by a "stop" message to stop the simulation and extract data. This could also be supported through a response to an SDP packet in theory since it is triggered by SDP anyway. This is not currently done because the core actually exits at this point currently; this would have to be changed.
alan-stokes commented 5 years ago

also, to note, from a nengo gui tie in. having a pause command, akin to the prove and exit, but which puts it in pause, would be appreciated. This will allow run forever to stop and restart without reloading from scratch as well

alan-stokes commented 5 years ago

aka, arren and icub peeps could also use a pause so that its prepared for a run and only needs to hit the resume when demoing

rowleya commented 5 years ago

Sounds like a nice addition - currently this is supported through run(0), manual pause (e.g. wait for keyboard input) then run_forever(), but I admit that this will then not allow repeated phases, and would be better supported with a pause() command at the lower levels. Presumably not actually exiting the binary during the ApplicationFinisher would be needed here potentially.

alan-stokes commented 5 years ago

exactly, in the nengo gui, you can pause it at any point. Currently the impl just ignores the pause and runs to the time frame allocated to it anyhow. And if you resume on the gui, the sim jumps to whereever the model is in the spinnaker sim. so not exactly doing whats expected. If we support said pause, it will allow a better intergration with the gui.

rowleya commented 5 years ago

Yes, pause does have the issue of knowing where to pause. This might mean an additional command to send the position you think you have paused at, then the cores can catch up to this point before pausing. Cores that have run on beyond this are interesting though!

alan-stokes commented 5 years ago

I know itll be a good bit of work, and effect the speed up performance, as we'll need to reset the entries. But could we have a FR command sender /injector? Where at runtime, the fr's are set to a tree covering all cores in the board/machine, and when the FR command/packet is received by the core, it goes into pause?

when it finishes (aka when say the extra_monitor receives a given fr with a given format), it updates the fr route back to the data out format?

the reason im thinking this, is that we can give it a good shot at guanteeing a packet gets there within a given timestep, esp with the repeat mode of the command sender, we should be able to adjust to a few fr packet losses. It then gets around the issue of a multi-cast sdp message and machine level scope.

coz they all paused within a timestep, it means we shouldnt need to worry about cores going over, as it shouldnt happen, and if it does. well we were screwed already. aka how the command sender is if it loses all its command packets.

i guess we could have different states for paused within time, and paused outside time, and have the host decide if they're happy to continue........... I mean, the core could jsut reset its time slot, if its over, as backtracing is going to be impossible wtihout epcially slowing down everything

rowleya commented 5 years ago

I think pause now is the best we will get and fits the use-case you state. Using FR / Multicast is a good idea and should be enough (NN could also work though it will be slower). If cores are out-of-sync at this point, then so be it - the only way to avoid this is as is done currently, and this can be detected at the pause state. If the user decides to continue from there, it would from the same desynchronized start point, which is probably good enough.