We need to be able to restart brooklyn, such thta it re-bind to all the same entities that it had before brooklyn was shutdown.
Requirements in more detail:
Want to be able to checkpoint the state of brooklyn (e.g. on a clean shutdown), or for entities to push important changes to their state (e.g. so that we are resilient to an ungraceful failure).
When re-starting:
Don't let policies make decisions until we are sure we have an up-to-date view of the world; wait until all entities have been recreated and re-bound.
e.g. an AutoScalerPolicy for a dynamic cluster should not attempt to resize until it is sure all entities have been recreated (and their state is up-to-date for making decision).
Restore configuration of entities and locations before things start calling effectors (e.g. a FixedListMachineProvisioningLocation needs to know which machines were previously allocated before anyone calls obtain())
Don't restart or reinstall components that are down. Instead, let failure-recovery policies handle that.
We don't know why the component is down (e.g. security might have stopped it while brooklyn was offline because the machine was compromised).
We want to handle both graceful and ungraceful brooklyn shutdown (e.g. kill -9 or machine crashing).
We want it to be possible to not lose any data during such a failure.
Additionally, it's worth thinking about how a general solution for "rebind" relates to connecting brooklyn to an environment where components are already running, so that brooklyn can start managing a running system.
persistence format is currently xml (using xstream); want to change that to a clean json (using jackson)
locations do not have a change listener; currently they are persisted when an associated entity is persisted. A bespoke location-persistence mechanism is under development with a customer, which should hopefully make it into master soon.
we should version the persisted state somehow (or define a best practice for how an entity author would do this).
Handling/testing with in-progress actions (e.g. reporting gracefully on restart)
We need to be able to restart brooklyn, such thta it re-bind to all the same entities that it had before brooklyn was shutdown.
Requirements in more detail:
FixedListMachineProvisioningLocation
needs to know which machines were previously allocated before anyone callsobtain()
)kill -9
or machine crashing). We want it to be possible to not lose any data during such a failure.Additionally, it's worth thinking about how a general solution for "rebind" relates to connecting brooklyn to an environment where components are already running, so that brooklyn can start managing a running system.