brentjanderson commented 8 years ago

State needs to be managed effectively in order to achieve the following objectives:

[ ] Freeze and thaw flight state to save/reload flight configuration
[ ] Track mutation events triggered by simulator events and time
[ ] Track mutation events triggered by inbound input from clients
[ ] Broadcast the latest state to subscribed clients
Freeze/Thaw vs Cloning

There are two approaches to creating a flight. One is to have a fully serialized flight state that is used to build the runtime of the flight (OTP Supervisor tree). This flight's state can then be serialized and stored back in the database as a new snapshot of a flight. This approach is similar to how Flint tackled this problem - it would simply scan all documents for the right simulator_id and it would build everything it needed for runtime based on that factor.

An alternative approach would be to have a mechanism for keeping track of flight state, and another mechanism for generating a flight in the first place. I'll call this "cloning". Essentially, a mission template would exist that describes the initial state of the flight (e.g. the number of simulators, their configuration, baseline sensor contacts for the universe, etc.). This initial state would serve as the foundation for the actual runtime events of the flight itself.

If Freeze/Thaw is sufficient, that's great. The biggest issue I see there is in the case of running the same "simulator" in multiple instances (e.g. online games). Due to primary keys and uniqueness, when "thawing" flight state, it would be necessary for the flight to not exist as an instantiated, running flight. Otherwise you'd see data overwritten and the flights would effectively merge in weird ways.

Perhaps, to synthesize all of this into a single workable solution, you need to have the following patterns established:

A flight is essentially a bucket for redux-style events that describe the global state of the flight. Each event consists of a numeric counter (or, if multiple event processors exist, a vector clock), the event payload, and optionally a delta of the state tree or the full state tree.
These redux-style events are generated server-side only, and are not directly shared with clients or generated by clients. This doesn't mean that clients can't change server state, it only means that clients cannot create their own events and drop them directly into the server's state management system. They must go through an endpoint of some kind that filters the logic (authentication/authorization, etc.)
When a redux-style event is triggered, it generates a new state in the store and makes that state available.
It may make sense to have multiple stores (relay-style), where one store is responsible for the simulator, one is responsible for the global sensors state, etc. If this is the case, then the flight becomes a collection of stores, and each store has its own series of events. When serialized, each even stream needs to be stored in order relative to its own store.
Serialization of the flight comes through storing the event stream associated with each store. If we feel that we can effectively store the flight data with the event, then each event could also include the new state associated with the event's application (the root state would be an empty map).
Periodically, it may make sense to "compact" the event stream. This is easily achieved if we have a copy of the latest simulator state attached to each event after it has been processed - simply discard events older than a certain threshold. If we do not have the latest state, then we would need to capture the latest state (up to whatever threshold of latest events we want to keep), log that as a "special" event with the full state, and then discard anything before that state.
To publish to clients, there are a few things that I can see working out. One would be that clients would have some sort of path selector through a store's tree, e.g. "flight.simulators[parent].reactor.heatIndex" or "flight.sensorContacts.*", and the store would search all of these subscriptions for the latest data. Another approach would be similar to Meteor's pub/sub, where a client would subscribe to a channel, and that channel would have a set of designated selectors that it would watch for in the tree with changes (this is why having the delta between events in the stream would be super helpful, since it simplifies identifying what's changed and which subscriptions need updating). Another approach would be to throw all of this out and stick to simply publishing on a given document's channel like you're already doing, however you may lose out on a lot of power, flexibility, and decoupling if you don't explore this road first.
It might make sense to have events "roll up" into more localized stores, so that systems of sufficient complexity have their own stores, such as a store for the simulator, a store for sensors contacts, a store for the overall flight (if that's even needed?), and then when an entity from one store interacts with another (e.g. a torpedo is fired by the ship, creating an event in the sensors contacts store, or one ship shoots another ship causing damage), there is some mechanism for one entity to generate events in any given store, according to game logic.

This is a bit of a brain dump, and there's a lot going on here, so to simplify, the following may be a tenable way forward:

Server side state management

Flights are buckets of stores, and stores are the stream of events in a simulation. There should probably be a store for each simulator, a store for the flight itself (maybe?), a store for sensor contacts, and a store for anything else that would need to have its own internal namespace for events, either for performance or organizational reasons.
When a flight template is "cloned", it takes a set of template events and drops them into the stores to generate state. These template events would have "magic" keywords in them to generate new ID's, where needed, in order to avoid global state conflict. This may not be necessary, if uniqueness is ensured at the flight level, so if a flight is uniquely keyed then everything else can rely on that uniqueness.
- Perhaps this means that there are events that mutate "state" and then events that mutate OTP structure - in other words, a flight template could say "there are 4 simulators in this simulation, and a sensors arena, each with their own store. Simulator A has this set of systems..." etc. As the events are processed, it either mutates state or it mutates the genserver structure to build up/tear down whatever scaffolding is needed
- This is making my Elixir metaprogramming spidey sense tingle a little bit. I don't know enough about it to say for certain, but it feels like metaprogramming on some level.
Events would be dispatched to the store, which consists of an Agent for storing the series of events, and a GenServer for processing the series of events. Figuring out how to capture all of the event reducers (a la Redux's combineReducers) would be an important part of this. Otherwise, it's actually conceptually quite simple, I think.
Serialization would consist of capturing the current state (for brevity's sake) and/or the stream of events. I'm of the opinion that the stream of events should be serialized to a database in the long-term, so that it can be used for advanced flight analytics.
Client-side state management

A naive implementation could be that the stream of events for a given store would stream through to the client and be played directly into Redux. Client-side mutations could be optimistic: When a client side event is dispatched, it is applied tentatively (assuming success), the event is sent to the server, the server approves or denies the event, and the true result based on server-side computation is then pushed back to the client. This is very similar to Meteor's approach. The downside is figuring out how to not push all the data to all the clients - there needs to be a subscription mechanism of some kind. Apollo should be trying to solve this problem in some way, but so far it's not fast enough based on your past experience.

Another approach would be to declare specific paths in the data structure of each store to watch for changes, and then react when those changes occur.

This is admittedly the part that is toughest for me to figure out at the moment, and will likely take some further thought.

Entity Component System

Something that would flip all of this around quite a bit is if we were to look at applying principles from https://en.wikipedia.org/wiki/Entity_component_system - Essentially everything is an "entity" in the simulation, and each entity can have a "component" attached to it. A "system" is a loop that scoops up each component during each tick of the game cycle, makes adjustments based on current input states, and then sets those values as a result. The advantages to this approach would be that it's already successful as a strategy in other games. The disadvantage is that we've never done it before. It would probably throw all this other stuff out the window, and I'm not advocating for it, but it's worth looking at to see how others are solving this problem. It would simplify a lot of how the data is managed. Note that each entity would not be a process, but each "system" would likely be an elixir process. I'm not sure how subscribing to a given entity would work, but it could work out alright.

Latency compensation

One thing that would make a lot of this better would be some kind of latency compensation. At its most basic form, I would expect two pieces - one would be "key frames" that would define the current state at a given moment in snapshots, to enforce client integrity, and another would be optimistically assuming certain conditions in the simulation. In other words, when a sensors contact is dragged to a new location at a given speed, only one event is emitted: "The contact is moving at this speed to this new place". The sever and the client would start animating the position of the contact based on this data, but the server would periodically send out "the client is at x,y,z location" so that there is still an authoritative answer about positioning. This periodic update would not have to be at 60 fps, though, so you get efficient networking coupled with realtime simulation.

brentjanderson commented 8 years ago

As I have reflected on this, the following points have been made pretty clear:

Reactivity is hard, because capturing and filtering mutations is hard
Using server side redux-events is helpful for organizing state transitions, but not necessarily for organizing reactivity
Using Actions to mutate state on the client may have side effects pushed up to the server. This is where optimistic updates come in - unless we are funneling all state changes to the server, and then wait for the server to broadcast any updates, we need some way to apply and then roll back or commit to state changes as they happen, while at the same time accepting arbitrary state mutations that are not bound to a specific action, making time travel on the client irrelevant.
Actions triggered on the server should mutate the data store, which should be able to generate change set events for subscriptions. The easiest way to implement this that I'm aware of is by using Rethink DB change feeds, unless we chuck Redux out the window and stick to mutations of state by themselves. I think that the change feed approach isn't a bad one, though, and it's worth trying to optimize up front. As actions are committed in sequence, changes to the database are applied, and those changes then propagate down to clients that have subscribed for those change sets. Those changes are merged into the client's state tree, and an update is triggered.

Figuring out how to handle those updates is the tricky part. Frankly, revisiting state as a whole from the perspective of video games wouldn't be a bad thing to do, however for the sake of moving forward, the above should be a good starting off point.

brentjanderson commented 8 years ago

http://jaysoo.ca/2016/01/03/managing-processes-in-redux-using-sagas/

This link has some interesting comments about side effects on just a client-side basis. Each client, and actors on the server are event generators, thus giving us headaches about all of this in the first place.

Thorium-Sim / thorium

Effective state management #23

Freeze/Thaw vs Cloning

Server side state management

Client-side state management

Entity Component System

Latency compensation