Efficiently persisting on every change

CMCDragonkai commented 3 years ago

I was looking at the persistence section of the README.md and it seems to mean that it would save the entire in-memory DB as a string in one go.

Is there a way to hook into the changes of automerge, so I can persist the entire data structure on every change? Or even bundle the changes together and persist.

I'm looking to use leveldb from https://github.com/Level/level which provides a key-value store that is automatically persisted and atomic.

I think all that needs to happen is for automerge to translate all of its changes to just key-value changes, and this can be saved in level.

ept commented 3 years ago

Hi @CMCDragonkai, I recommend the following approach for persistence:

Maintain a log of changes as they happen: use Automerge.getChanges() to get new changes every time a document is updated, and append them to the log. You can implement this log in LevelDB by storing each change under a separate key, where the key is e.g. the actor and sequence number of the change. This way, a small change to the document results only in a small write to LevelDB.
From time to time, when the log of changes gets long, take all the changes and write them out using Automerge.save(). In the upcoming 1.x release series of Automerge (currently available on the performance branch), this encodes the data in a much more compact format, which saves a lot of space. Store the whole output of Automerge.save() under one key in LevelDB.

CMCDragonkai commented 3 years ago

Ok, that's interesting, what about the other side that is reading the changes from level db? If I have a log of changes in level db, do I need to iterate over level db and apply them into automerge?

Sounds like the your 2nd item is an occasional "compacting" phase?

CMCDragonkai commented 3 years ago

I suppose this: https://github.com/Level/level#createReadStream is sufficient for reading in a certain order. You're talking about the sequence number of changes, so this would ensure we have an order to each change?

Also why do you suggest storing the actor number? In my case, every actor is on a different system, and the each actor is an agent in our P2P system and they would be persisting their own state.

CMCDragonkai commented 3 years ago

Oh I should not forget to mention that our data has to be encrypted too on disk... so we would be encrypting the values before putting them into level db.

ept commented 3 years ago

If I have a log of changes in level db, do I need to iterate over level db and apply them into automerge?

Yes. I suggest loading them all into an array (each entry in the array is one change), and then calling Automerge.applyChanges() to apply them all at once.

Sounds like the your 2nd item is an occasional "compacting" phase?

Correct, the result of save() contains the same information as a log of changes, but represented more compactly. To load a compacted document, first use Automerge.load() to load the result of the most recent save(), and then use applyChanges() to apply all the individual changes that have happened since the most recent compaction.

You're talking about the sequence number of changes, so this would ensure we have an order to each change?

Automerge accepts the changes in any order, and ensures that the loaded document is always the same, so the order in which you load changes is not really important.

Also why do you suggest storing the actor number? In my case, every actor is on a different system, and the each actor is an agent in our P2P system and they would be persisting their own state.

The actorId of a change is the actor who generated that change (i.e. the one who called Automerge.change() in the first place). If you have several users editing a document, the document will contain a mixture of changes with different actorIds. The sequence number is only unique per actorId, so you need both the actorId and the sequence number to identify a change.

ept commented 3 years ago

I don't think anything further needs to be done here, so I'm closing this issue. Please reopen it if I have missed something.

automerge / automerge-classic

Efficiently persisting on every change #331