Versions are streamed, but should patches be streamed?

canadaduane commented 3 years ago

I'm implementing a client-side parser for subscriptions. The stream of versions requires that the implementation parse out version bodies as they come in, and in an event-like manner, emit results to the HTML renderer.

Versions, however, consist of multiple patches. Granted, a version does count its patches and provide a count of upcoming patches in the "Patches: N" header; however, does that necessarily imply that patches are (unlike versions) not event-emitted? Or are all patches required to be received before an event is emitted?

For example: Suppose you have a version on the server and you know it has 4 patches, but each of these patches is coming from different databases. There are two possible ways to send this version:

Wait until all databases have returned a result. Combine them into a version and send (Patches: 4; [P1,P2,P3,P4]).
Send a version header with "Patches: 4" (Patches: 4; [...). As each database returns its result, append a patch to the stream. (P1,... P2,...)

On the receiving end, this has implications of "liveness". If we want to be able to apply patches as soon as possible, the 2nd option would be preferable. However, if it's OK to wait until they're grouped together, then the 1st option would make it easier to implement the parser (and easier to act, since all information is included in an array of immediately available "patches").

In short, should we recommend one way or another in the spec, or is this an implementation-specific detail? Perhaps we should be explicit about that, if so.

josephg commented 3 years ago

Or are all patches required to be received before an event is emitted?

This is my preference. The reason to group patches into transactions is that it makes a transaction semantically atomic. There should be visible no point in time where some of the patches in the transaction have been applied and other patches have not been applied. So at some level of the stack the individual patches should be buffered before being applied all at once.

Suppose you have a version on the server and you know it has 4 patches, but each of these patches is coming from different databases.

The question I have is, are the 4 patches atomic? Like, are there allowed points in time where some of these patches have been applied, and not others? If they're all part of the same transaction I'd expect them to all be in the same database anyway. If there's in separate databases that usually implies they can be applied separately - in which case its probably 4 individual transactions / versions that are merged together.

In short, should we recommend one way or another in the spec

Yes I think this is a good idea. I might write up a PR for the spec talking about "patch philosophy". In short:

The version uniquely names the state of a document
Each transaction which modifies the document also modifies the version (unless the version is implicit and missing from the messages)
Each transaction is atomic - the patches a transaction contains should either all be applied, or none of them should be applied. (If there are valid states in between two transactions, then those intermediate states should also have versions!)

canadaduane commented 3 years ago

These are good points. Reflecting on my motivation for streaming patches in the original question, I realized this thread may have influenced me:

https://groups.google.com/g/braid-http/c/tduTxabr7Tk/m/0q_H3LbyAQAJ

You gave an example of automerge metadata for typing "h" followed by "i", and then Mike said, "this is super verbose" and went on to give an example of patch contents with "h" and "i".

I know it was just an example, but because collab document editing implies live results shared everywhere, I think I assumed from that conversation that patches could be a candidate for "live" updates, one patch at a time.

(Now that I think about it though--when would you know in advance that you have only 2 keypresses--or N keypresses, for that matter--coming?)

josephg commented 3 years ago

then Mike said, "this is super verbose" and went on to give an example of patch contents with "h" and "i".

Automerge doesn't allow consecutive inserts to be merged like Mike suggested. Consecutive inserts only get compacted together in the RLE wire format. Semantically these are two separate inserts for the algorithm. (And each insert has its own ID and other things).

I think I assumed from that conversation that patches could be a candidate for "live" updates, one patch at a time.

Yeah, as I see it each keystroke will usually generate a transaction with 1 patch.

toomim commented 3 years ago

If you have two events separated in time, then they'll have separate versions, because the semantics of a "version" is a "discrete point in time."

The only situation in which you'll have multiple patches per version is when the semantics is "these changes happen at the same time."

If your client wants to display the patches as they come in, then that means it is ok with showing the user a view of reality that doesn't match the semantics that the sender is expressing. I could imaging that being desired in a use-case analogous to a "progressively loading jpg", where the user knows that he's seeing partial results. But in the normal case, I imagine that clients will only want to display a new version once it can display the whole entire version.

If your server wants to send patches one-at-a-time, with big pauses of wall-clock in between each patch, that's fine. The client that receives it will know that they all are to be semantically considered to be created at the same time.

In fact, even if your server waits to send all patches at once, it's still possible (and perhaps even likely) for the packets going over the wire to be delayed, and be reconstituted in the client in progressive batches.

So it makes no difference whether a server waits to have all patches available before sending them, or just sends them as it gets them available. All that matters is that any patches that are semantically considered to have been made at the same point of time exist at a single version.

toomim commented 3 years ago

I also notice that Seph is using a term "transaction" here that isn't defined in the Braid spec. For anyone interested, we have been discussing it on a PR on his personal Braid protocol implementation here: https://github.com/josephg/braid-protocol/pull/7#issuecomment-786831185

josephg commented 3 years ago

Yeah I agree with all that - and at least for now, I don't see much benefit in providing support for streaming the patches within an update. People will probably want this at some point, but it'll complicate the protocol & API, and I'm in favour of waiting to support this when we have some actual use cases for it.

josephg commented 3 years ago

Unless there's further comments here, I suggest closing this issue for now.

canadaduane commented 3 years ago

Agreed. To summarize: patches should be bundled and considered all part of the same "version" in an update, so it wouldn't make much sense to accept streaming patches (although technically, at this point, one could do such a thing).

braid-org / braid-spec

Versions are streamed, but should patches be streamed? #98