Closed ept closed 3 years ago
A thought on "unfillable holes": perhaps these could be garbage collected during a save() compaction? This would let them hang around for a little while in the hopes of being restored but at whatever interval a developer chooses to run a compaction and save() the file they'd be polished away.
@pvh With "garbage collecting holes", do you mean discarding all changes that depend transitively on the hole? That is possible, and in fact I believe the current save() already does this, since it only stores changes that are reachable from the heads, and changes with missing dependencies don't count towards the heads (they are only enqueued but not applied to the document). However, I'm still doubtful that it's a good idea to rely on this behaviour.
What about the idea to resolve holes with applyChanges
rejecting changes with missing deps - (and specifying which holes).
Doing so means we can remove the getMissingChanges
api call and push the responsibility of hole management to the developer
What about the idea to resolve holes with applyChanges rejecting changes with missing deps - (and specifying which holes).
I think that would be a good idea; we've previously had cases of people getting confused when they called applyChanges and the document did not change because a dependency was missing. Making this an explicit error would be a better developer experience than silently buffering the change.
I'm not sure what the type signature for applyChanges should look like, though. We want it to either return a success status with an updated document, or an error status with a list of missing hashes. JavaScript exceptions are pretty bad so I don't really want to use an exception for the error status. Maybe we could make it return a kind of pseudo-tagged union type like this (in TypeScript types):
function applyChanges<T>(doc: Doc<T>, changes: Uint8Array[]): Success<T> | MissingDeps
interface Success<T> {
status: 'success'
doc: Doc<T>
}
interface MissingDeps {
status: 'missingDeps'
missingDeps: Hash[]
}
What do you think?
@ept a couple of minor questions/ideas
{encode/decode}Message
pair of functions? (It could use that first byte to decide what message type it has.My focus right now has been on taking the example code in sync_test.js and trying to implement it in my little testbed application and I think I'm getting close. You can see that implementation here: https://github.com/pvh/automerge-demo but I warn you that it is very WIP and while I'm trying to only commit working versions it sort of grows bloat and then reconciles it again and I might have hand-edited my Automerge tree without pushing those changes publicly once or twice so if it doesn't work that's absolutely my fault and I'll aim to fix that before calling it a day here.
I think that would be a good idea; we've previously had cases of people getting confused when they called applyChanges and the document did not change because a dependency was missing. Making this an explicit error would be a better developer experience than silently buffering the change.
this may be a quirk of our application but this would currently create race conditions for us. We rely on being able to immediately listen to changes on an incoming signal and then we can rest assured it’s ok that our naive sync protocol finishes getting the entire doc whenever it finishes delivery.
Possibly that’s not a legitimate problem after the sync protocol is landed though?
It feels weird that the messages don't have the type baked into the binary for the wire protocol. Should we include a leading byte / bit that determines the message type?
@pvh They do have a leading byte (0x42
, chosen arbitrarily) to indicate the message type. Happy to pick a different number if you don't like that one.
Should we add a more generic {encode/decode}Message pair of functions? (It could use that first byte to decide what message type it has.
Could do, although since we only have one message type at the moment, that seems a bit redundant.
We rely on being able to immediately listen to changes on an incoming signal and then we can rest assured it’s ok that our naive sync protocol finishes getting the entire doc whenever it finishes delivery.
@dan-weaver You could solve this by buffering changes until all the dependencies are satisfied. Something like this:
let receivedChanges = [], doc = Automerge.init()
function onReceiveChanges(changes) {
receivedChanges = receivedChanges.concat(changes)
const result = Automerge.applyChanges(doc, receivedChanges)
if (result.status === 'success') {
doc = result.doc
receivedChanges = []
} else if (result.status === 'missingDeps') {
// do nothing, try again on next call to onReceiveChanges()
} else {
throw new Error(`Unexpected result status: ${result.status}`)
}
}
…which would give you the current behaviour.
treatment of change messages vs. sync messages feels inconsistent
asking developers to manually check getMissingDeps feels incorrect waiting for a sync message to arrive before applying changes feels wrong too (particularly for single change messages) -> why shouldn't we apply all applicable changes and only publish the patch at a "suitable time"? waiting to update sync state until full sync is complete feels incorrect -> if i am getting 10,000 changes from a peer how do i know how many changes i'm waiting on for a progress bar?
naming & types
network protocol
febe protocol
sync state
changes
There are probably a few other smaller pieces but this covers a lot of it.
This looks very interesting.
I was wondering if you have thought about extracting this functionality into a separate library so it can be reused in other projects? It seems that the API should be general enough.
Local changes can immediately be sent to all connected nodes, while changes received from another node should be exchanged through this protocol to avoid having a node unnecessarily receive the same changes several times from different nodes.
I was wondering if you have considered using a spanning tree like protocol such as plumtree, epidemic broadcast trees or Brisa for replication changes with multiple nodes. See this for more information.
Hi @arj03,
I was wondering if you have thought about extracting this functionality into a separate library so it can be reused in other projects? It seems that the API should be general enough.
Potentially, although it does make some assumptions that may be specific to Automerge, such as the API provided for traversing the hash graph. My inclination would be to keep this functionality as part of the Automerge core for now, and we can consider splitting it out later if there is demand.
I was wondering if you have considered using a spanning tree like protocol such as plumtree, epidemic broadcast trees or Brisa for replication changes with multiple nodes.
I've looked at a few gossip / p2p overlay networks; our intention is that the API provided by Automerge core should be general enough to be used for many different types of network topology. There can then be several different networking layers (some using client-server websocket, some using p2p protocols, etc) that build upon the same core API. Apps can then choose a networking layer that is appropriate for their use case.
Yeah I guess it makes sense to keep it in automerge especially as you fine tune the algorithm so that it fits the problem at hand as best as possible instead of trying to solve many different problems all at once.
I mentioned the articles because they specifically deals with the timeout and how to handle data from multiple peers problem. It is good to know you are aware of existing literature in that space. What I'm trying to say is that it seem like a separate concern from the main algorithm, again in automerge it might make sense to keep those two aspect very close because an end user doesn't really care about those details.
Anyway just wanted to provide some input, what you are doing here is really exciting!
Can we close this?
@lightsprint09 Yes. Closing this because it is superseded by #339.
Here is a second attempt at the sync protocol discussed in #290, superseding the first attempt in #324, after discussion with @pvh @orionz @alexjg. It addresses several issues with the first protocol, in particular, making the API more functional (avoiding side-effects) and giving the networking layer more control over encoding and scheduling of communication.
There is now only one type of message, a
SyncMessage
with the following structure:When a node sends one of these messages, it has the following meaning:
heads
: The latest change hashes known to the sender (where "latest" means "no other change depends on them").need
: The hashes of any changes that are being explicitly requested from the recipient.have
: A summary of the changes that the sender of the message already has, or is expecting to receive from another node. This is implicitly a request to the recipient to send all changes that the sender does not already have.When two nodes connect to each other, they initially both send each other a SyncMessage to inform the other about their current state. If the two nodes are already in sync, there are no further messages. If the nodes need to send each other changes, they do so in response to this initial message, and the changes are also accompanied by another SyncMessage. Most of the changes are sent in this first response, but in some edge cases there can be a bit more back and forth of changes and SyncMessages until we have silence.
Whenever a node updates its document (because it updated the document itself, or because it received changes from another node), it informs all the nodes it is connected to by sending them each a SyncMessage, in exactly the same way as the initial message on connection. This triggers a fresh run of the protocol. Local changes can immediately be sent to all connected nodes, while changes received from another node should be exchanged through this protocol to avoid having a node unnecessarily receive the same changes several times from different nodes. If a document update happens in the middle of a sync protocol run, that's fine too.
Automerge implements the protocol with the following backend functions, four of which are new:
getHeads()
returns the current heads hashes of the document (as an array of hexadecimal strings).encodeSyncMessage()
anddecodeSyncMessage()
are straightforward: they convert between theSyncMessage
object representation and a byte array representation of a message. They use a simple binary encoding that helps keep the protocol compact and efficient.syncStart()
is called when we first connect to another node, or when we want to inform another node about some new changes. The argumentlastSync
should be the heads hashes after the last completed sync with the other node (or left blank if this is the first sync with the other node). The function returns the initial SyncMessage to send to the other node.syncResponse()
is called when we receive a SyncMessage from another node, and the argumentmessage
is this message we received. The function returns a SyncMessage to send in response, and an array of changes to send to the other node, in their binary encoding (#253).getMissingDeps()
is an existing function that returns any change hashes that we don't have, but which are dependencies of another change. I have updated this function so that it can now also figure out missing dependencies in a set of changes that have not yet been applied to a document.Note that none of these functions modify the document state: they are all side-effect-free. The only effect on a document happens when the recipient of some changes takes them and calls
Automerge.applyChanges()
to apply them to a document.Using the protocol does require some additional state: in particular, it must accumulate any changes that are received (using
getMissingDeps()
to identify any missing changes), and update the document when the set of changes is complete. It also needs to remember the heads from the last completed sync, and it needs to take some steps to avoid duplicate messages (since the functions above are stateless, they don't know whether an identical change or SyncMessage has already been sent previously). The intention is that with the API provided, this becomes a fairly simple state machine.A simple example of such a state machine appears in the class
SyncPeer
at the top oftest/sync_test.js
, and this state machine is used extensively in the tests. I have not made the state machine part of Automerge's public API, because I suspect that different networking layers will want to implement the protocol state machine slightly differently (e.g. a client-server WebSocket implementation may well have different needs from a peer-to-peer implementation). My hope is that the API in this PR is general enough to satisfy all those use cases, but also powerful enough that networking layers don't need to do too much extra work.Some discussion points:
syncResponse()
.Automerge.Backend.decodeSyncMessage()
). I think this is okay, especially since we will want networking layers to work both when frontend and backend on the same thread, and also when they are on different threads, so networking layers will need to be aware of the frontend/backend distinction in any case. The exception isgetMissingDeps
, which is already available on the top-levelAutomerge
object — maybe we should change it to only make it available on the backend?have
field from a SyncMessage we received, and adding its contents to thehave
field of a SyncMessage we send. This indicates to the recipient of the second SyncMessage that we are already expecting to receive a certain set of changes from the sender of the first SyncMessage. If the changes from the first node don't arrive as expected due to a network interruption, we can make a fresh request (without copying thehave
field) after a timeout.Internally, each entry of the
have
field of a SyncMessage contains two things:lastSync
are the heads at the time of the last successful sync with the recipient of the message, andbloom
is a Bloom filter summarising all of the changes that the sender of the message has added since then. The details are explained in a paper. The API exposes Bloom filters only as byte arrays without any further structure, since the manipulation and interpretation of Bloom filters is completely encapsulated in the API.syncResponse()
returns all changes that it knows about and that satisfy at least one of the following conditions:need
field of the received message; orhave
field of the sync message or their transitive dependencies, and none of the Bloom filters in thehave
field of the sync message contain the change; orI believe that changes received from another node should not be applied to the document until all dependencies are satisfied. There are two reasons for this:
need
field, but it will never be resolved. Having such eternal holes also makes it difficult to determine when the sync with another node has completed. Perhaps we could give up on holes after a timeout, but that makes the protocol more complicated.Nevertheless, when performing a large sync, it would be annoying if we can't resume a partially completed sync that got interrupted due to an unreliable network connection, or if we can't display any data in an app until it has fully synced. To mitigate this, I think we could break a large sync into several smaller syncs, performed sequentially; then not much is lost if we disconnect during one of these smaller syncs. For example, each document or each head could be a separate sync. If there is a really long change history for a document, we could even break it up into chunks (say, sync 1MB of changes at a time, rather than jumping all the way to the heads in one sync).