Can we remove Patches: 1 when there's only one patch?

josephg commented 3 years ago

Currently when there's exactly one patch the protocol looks like this:

HTTP/1.1 209 Subscription
content-type: text/plain
transfer-encoding: chunked

content-type: application/json
version: "abc"
patches: 1

content-length: xxx
content-range: yyy

new-contents

Note there's 3 levels of HTTP headers here:

Real HTTP headers (eg transfer-encoding: chunked)
Transaction level headers (eg patches: 1)
Patch level headers (content-range: yyy)

Ideally I'd like to collapse levels 2 and 3, because that would make the protocol simpler in the 98% case where only a single patch is being sent in each transaction. Ie:

HTTP/1.1 209 Subscription
content-type: text/plain
transfer-encoding: chunked

content-type: application/json
version: "abc"
content-length: xxx
content-range: yyy

new-contents

This would simplify parsing - especially in systems where multiple braid level patches is invalid (eg sharedb, automerge)

Downside:

Currently the existence of the patches: X header differentiates between a snapshot update and a patch (delta) update. There are other ways we could do this - for example we could add a patch-type header, or use content-type to differentiate - maybe with content-type: patch or something.

Anyway, keen for thoughts?

canadaduane commented 3 years ago

Currently the existence of the patches: X header differentiates between a snapshot update and a patch (delta) update. There are other ways we could do this - for example we could add a patch-type header, or use content-type to differentiate - maybe with content-type: patch or something.

I think the presence or absence of the content-range header would be sufficient to differentiate between "snapshot update" and "patch (delta) update" wouldn't it? i.e. if not present, then it's a snapshot update.

josephg commented 3 years ago

If the patch is expressed using automerge's internal format, then the content-range header will be missing.

canadaduane commented 3 years ago

I wonder if the spec already allows for your proposal?

https://github.com/braid-org/braid-spec/blob/master/draft-toomim-httpbis-braid-http-03.txt#L424-L443

The presence of the Merge-Type header in the version is noteworthy. If a version has a specific Merge-Type, will the question of "is this a snapshot or a patch" be delegated to the underlying sync system? In other words, is the distinction even useful if you pass a response back like this:


         HTTP/1.1 209 Subscription
         Subscribe: keep-alive

         Version: "ej4lhb9z78"
         Parents: "oakwn5b8qh", "uc9zwhw7mf"
         Content-Type: application/octet-stream
         Merge-Type: yjs
         Content-Length: 27

         $����$ N .��'�$��$N.��p�$��

Is that binary replacing the current version of the document, or is it patching it? Does it matter to the Braid protocol?

toomim commented 3 years ago

Ah, yes we need to make this clearer in the spec. Indeed, the Patches: N header is already optional. In fact, we have no choice in the matter — it's already defined in the PATCH method (see rfc5789). In other words, you can already send a patch to a server without Patches: N by doing a PATCH request. (Example below.) The Braid-HTTP spec just adds an optional new way of specifying patches, if you want to use it.

Since we were focused on explaining the new functionality when writing the spec, I think we overlooked the value of articulating all the ways in which you can use only parts of the spec, without using other parts. And this is particularly apparent here— we didn't give any examples illustrating how you can issue a single patch without the Patches: N header, even though it's an entirely nice way to use the protocol.

Downside:

Currently the existence of the patches: X header differentiates between a snapshot update and a patch (delta) update. There are other ways we could do this - for example we could add a patch-type header, or use content-type to differentiate - maybe with content-type: patch or something.

When you use PATCH, you specify the type of patch via the content-type. For instance:

Request:

         PATCH /chat
         Content-Type: application/json-patch+json

         [
           { "op": "test", "path": "/a/b/c", "value": "foo" },
           { "op": "remove", "path": "/a/b/c" },
           { "op": "add", "path": "/a/b/c", "value": [] },
           { "op": "replace", "path": "/a/b/c", "value": 42 },
           { "op": "move", "from": "/a/b", "path": "/a/d },
           { "op": "copy", "from": "/a/d/d", "path": "/a/d/e" }
         ]

This might seem odd, because the underlying document is application/json, not application/json-patch+json, but the semantics of patch is that the headers refer to the patch itself, not the underlying document. See rfc5789 section 2:

   Note that entity-headers contained in the request apply only to the
   contained patch document and MUST NOT be applied to the resource
   being modified.  Thus, a Content-Language header could be present on
   the request, but it would only mean (for whatever that's worth) that
   the patch document had a language.  Servers SHOULD NOT store such
   headers except as trace information, and SHOULD NOT use such header
   values the same way they might be used on PUT requests.  Therefore,
   this document does not specify a way to modify a document's Content-
   Type or Content-Language value through headers, though a mechanism
   could well be designed to achieve this goal through a patch document.

Thus, it's totally ok to leave out Patches: 1, and just issue a normal PATCH request. When you do so, you'll want to be cognizant about the distinction between headers about the resource and headers about the mutation. This distinction is made explicit when you use Patches: N by the blank line separating the resource from the version (example taken from Braid-HTTP section 2.3):

      Request:

         PUT /chat
         Version: "up12vyc5ib"
         Parents: "2bcbi84nsp"
         Content-Type: application/json              <-- The type of the resource
         Merge-Type: sync9
         Patches: 1
                                                     <-- Newline separates resource from patch
         Content-Length: 326
         Content-Type: application/json-patch+json   <-- The type of the patch

         [
           { "op": "test", "path": "/a/b/c", "value": "foo" },
           { "op": "remove", "path": "/a/b/c" },
           { "op": "add", "path": "/a/b/c", "value": [] },
           { "op": "replace", "path": "/a/b/c", "value": 42 },
           { "op": "move", "from": "/a/b", "path": "/a/d },
           { "op": "copy", "from": "/a/d/d", "path": "/a/d/e" }
         ]

Now, I personally feel it's a little bit gross to use the term "Content-Type" to specify the type of a patch. "Patch-Type" feels like it might be nicer? We chose "Content-Type" because it's what HTTP already does in the PATCH method, but we don't need to use the same word, because this is a slightly different context. I think if we called it "Patch-Type" it might be clearer that this object is a patch, and that might make it easier for people to understand the spec.

toomim commented 3 years ago

It also occurs to me that PATCH might only allow a patch to be sent in a request, but not a response. If you want send a patch in a response without the patches: N header, you might not be able to use the existing HTTP semantics, and we might have to define something new and put this explicitly into the Braid-HTTP spec.

If we do that, we'll need to distinguish the type of the resource from the type of the patch, and a Patch-Type header would make sense to me as a way to do that. I could imagine this:

Request

   GET /foo
   Parents: "x"

Response

   HTTP/1.1 200 OK
   Parents: "x"
   Version: "y"
   Content-Type: application/json
   Patch-Type: application/json-patch+json
   Content-Length: 326

   [
     { "op": "test", "path": "/a/b/c", "value": "foo" },
     { "op": "remove", "path": "/a/b/c" },
     { "op": "add", "path": "/a/b/c", "value": [] },
     { "op": "replace", "path": "/a/b/c", "value": 42 },
     { "op": "move", "from": "/a/b", "path": "/a/d },
     { "op": "copy", "from": "/a/d/d", "path": "/a/d/e" }
   ]

However, now that I see it written down, I'm not sure what advantage this has over the status quo:

Request

   GET /foo
   Parents: "x"

Response

   HTTP/1.1 200 OK
   Parents: "x"
   Version: "y"
   Content-Type: application/json
   Patches: 1

   Content-Type: application/json-patch+json
   Content-Length: 326

   [
     { "op": "test", "path": "/a/b/c", "value": "foo" },
     { "op": "remove", "path": "/a/b/c" },
     { "op": "add", "path": "/a/b/c", "value": [] },
     { "op": "replace", "path": "/a/b/c", "value": 42 },
     { "op": "move", "from": "/a/b", "path": "/a/d },
     { "op": "copy", "from": "/a/d/d", "path": "/a/d/e" }
   ]

It saves us one line of text and a newline, but now our protocol parsers have to handle another way to express the same thing.

toomim commented 3 years ago

Nonetheless, I think I'd be in favor of renaming Content-Type to Patch-Type everywhere in the Braid-HTTP spec where it is used to express the type of a patch.

Then the example above would look like this:

Request

   GET /foo
   Parents: "x"

Response

   HTTP/1.1 200 OK
   Parents: "x"
   Version: "y"
   Content-Type: application/json
   Patches: 1

   Patch-Type: application/json-patch+json
   Content-Length: 326

   [
     { "op": "test", "path": "/a/b/c", "value": "foo" },
     { "op": "remove", "path": "/a/b/c" },
     { "op": "add", "path": "/a/b/c", "value": [] },
     { "op": "replace", "path": "/a/b/c", "value": 42 },
     { "op": "move", "from": "/a/b", "path": "/a/d },
     { "op": "copy", "from": "/a/d/d", "path": "/a/d/e" }
   ]

toomim commented 3 years ago

This discussion has led me to see a deeper issue— perhaps we should rename all these headers:

Content-Length
Content-Range
Content-Type

to:

Patch-Length
Patch-Range
Patch-Type

...when they are referring to patches. This might not only clear up the ambiguities we're having here between Content-Type and Patch-Type, but also make the rest of the spec clearer to boot. I filed issue https://github.com/braid-org/braid-spec/issues/97 to discuss this broader idea.

josephg commented 3 years ago

So to summarize a little:

In requests, Patches: N is already optional because of the existing semantics of HTTP's PATCH method
In responses, Patches: 1 is optional but this is ambiguous in the spec at the moment we're not sure if it should be required or not

The remainder of work to decide on in this issue is whether patches: 1 is mandatory in subscriptions when a transaction contains 1 patch.

For / against:

Make patches: 1 in subscription responses optional:
- In some systems sending multiple patches within a transaction is invalid so they wouldn't need to impleemtnat parsing for that header
- The network protocol is simpler and smaller (1 less header)
- This is symmetric with the logic for PATCH requests
Or make patches: X in subscription responses mandatory, even when only one patch is being sent
- As Mike said, the logic for this will generally be needed anyway
- And if that logic is implemented, requiring it will make the parser a bit simpler (since there aren't multiple modes or anything like that)

I was worried about the confusing semantics of some headers in different positions, but I'm pretty confident that #97 fixes that. Anyway, I could go either way on this. Happy to make a call and close this issue soon.

toomim commented 3 years ago

That's a wonderful summary, thank you!

toomim commented 1 year ago

Chair:

The renaming of headers has moved to https://github.com/braid-org/braid-spec/issues/97
The question of whether to require Patches: 1 in subscriptions has moved to https://github.com/braid-org/braid-spec/issues/118.

braid-org / braid-spec

Can we remove Patches: 1 when there's only one patch? #96