Party snapshots - Githubissues

dmaretskyi commented 4 years ago

Why?

Model snapshots will allow us to store the model state in serializable format and then restore the runtime model from that saved state. This will significantly improve startup performance for large databases.

Current plan

Snaphotting will be performed by serializing a party state as a whole and saving it into a persistent cache. Those caches can be shared with other peers. Upon app restart the stack will discover the snapshots and create models already containing the deserialized state. All of the mutations that were included in the snapshot will be skipped for processing.

Model API

Add two additional methods for models:

createSnapshot(): Json returns a JSON-serializable object with the current model state. This snapshot must include all of the necessary data to fully restore the model state in the future.
restoreFromSnapshot(snapshot: Json) replaces the current state with the one from provided snapshot.

The methods should be optional and the stack must only perform snapshots if the exist on the model class. In the case when model doesn't support snapshots, stack will revert to storing an array of mutations for that model and replaying them on load.

For the first iteration we serialize snapshots as JSON data. In the future we might consider using protobuf encoding.

Versioning

Model versioning might be required if snapshot structure changes between versioning. In such case model either needs to be able to consume snapshots of all of it's previous versions or have a fallback mechanism so the stack can perform a full recalculation of the model state based on feed messages.

TODO

Snapshot timestamping

Each snapshot will be assigned a timeframe-timestamp (mapping of feed key => seq number). That timeframe will signify which feed messages are already included in the snapshot. Upon refresh those mutations will be skipped.

The stack will automatically determine the time intervals to perform snapshots (every 1000 feed messages maybe?).

Performance considerations

The most notable performance improvement will be in allowing the stack to skip reading feed messages that are already included in the snapshot. This way only the most-recent mutations will

Also when a new peer joins a party, he might get a full snapshot of a party state from other peers.

We should also consider the time it takes to save the full party state, as we cannot do any message processing while saving not to corrupt the snapshot. With large databases such a pause in processing will be a ux issue.

For that reason we should consider doing incremental snapshots where only the changed objects will be serialized.

dboreham commented 4 years ago

I think this is a typo:

Those caches other peers.

dboreham commented 4 years ago

In such case model either needs to be able to consume snapshots of all of it's previous versions

I think this is a reasonable approach for now.

dboreham commented 4 years ago

Nit: "cache" is a term with a specific meaning implying locality of temporal reference. I don't think it really applies to what we're doing here.

persistent cache

dboreham commented 4 years ago

The methods should be optional and the stack must only perform snapshots if the exist on the model class.

I'm not sure this is going to be practical: if any model in the party is not snapshot-capable, then no snapshots can be done, which seems problematic.

dmaretskyi commented 4 years ago

I'm also thinking about enforcing snapshot method for all models, but an alternative would be to just store the array of mutations for that specific model and replay them on load

richburdon commented 4 years ago

API: load/saveSnapshot?
createSnapshot: pass in timeframe vector: the snapshot should
re timeframe: consider protocol that would instruct other peers to create a snapshot at the same timeframe (I've called this an "epoch" before). Consider this to be part of the party "consensus" model. E.g., we all agree that as of timeframe T1 we are going to snapshot (all peers do the same thing). Perhaps one peer proposes the snapshot at a certain point. we might then extend the Dat protocol such that we replicate mutations RELATIVE TO a previous snapshot.
consider peers that join a party late -- they should be able to get the snapshots (perhaps via a "private" IPFS protocol?) and then catch up with the current mutation.
important to use protobufs from the outset (for versioning); this should not impose any technical overhead.
re versioning: complicated. it may be that a breaking model change requires that that item is "retired" and cloned into a new item. in which case these previous snapshots could be ignored.
related: the snapshot "file" should be an indexed collection of model states. (i.e., separable).
re incremental snapshots: suppose we have a party with one giant text doc and 10 chess games :). At s1, we have 1000 chars in the doc and 8 complete chess games, and 2 in progress. At s2, we have 2000 chars in the doc and 10 complete chess games. Should the s2 snapshot should include the previous 8 chess games that have not changed? If not, then we have to keep snapshots around. If we've marked these items as "deleted" then that's fine. Points to the need for granularity of individual model snapshots.
related: consider a party that is a year old and has a chat -- we may just never want to see the history of the chat beyond the last 30 days.

TL;DR: good proposal: let's spec the minimal next step experiment...

BUT FIRST DESIGN THE TEST SO THAT WE KNOW WHAT THE IMPACT IS.

dmaretskyi commented 4 years ago

Stage 1: Snapshots improve application startup time

[x] Serialization API for party objects
- [x] PartyProcessor: save halo messages in an array & replay on snapshot load
- [x] Model snapshots API (with fallback to serializing mutation array)
- [x] Item & ItemManager
[x] Snapshots storage
[x] FeedStoreIterator should be able to start reading from a provided timestamp
[x] PartyManager.open should discover existing snapshots and use them before processing feed messages
[x] Trigger snapshot creation every 1000 messages

Stage 2: Snapshots are exchanged between peers on invitation

[ ] Selective replication: peers can request to replicate feeds starting from specific sequence number
[ ] Transfer snapshot as a part of the invite process

Stage 3: Snapshots help peers catch up to changes

[ ] Requesting snapshots arbitrary from other peers
[ ] Consensus over when snapshots are created
[ ] Storing snapshots on IPFS

Other ideas

[ ] Backing up a party on IPFS

dxos-deprecated / echo

Party snapshots #245

Why?

Current plan

Model API

Versioning

Snapshot timestamping

Performance considerations

Stage 1: Snapshots improve application startup time

Stage 2: Snapshots are exchanged between peers on invitation

Stage 3: Snapshots help peers catch up to changes

Other ideas