HydraCG / Specifications

Specifications created by the Hydra W3C Community Group
Other
138 stars 26 forks source link

Question about resource versioning #190

Closed ghost closed 3 years ago

ghost commented 5 years ago

I already had this question on the mailing list probably years ago. I am curious if there is any progress/recommendation about this.

When multiple persons are working on the same resource there can be conflicts. E.g. 2 authors want to edit an article in the same time. In these cases we can add 2 branches or we can resolve the conflicts and merge the edits, just like we do by git. If we don't want to support all of these features in our REST API, then we need to refuse somehow the second request and let the client solve this problem for us. In order to do that we need to detect whether the modifications are based on the most recent version of the resource. We don't even need to save the previous versions of the resource, just generate a new version id by every change of the resource. That version id must be added to out update link as input and sent back by the update operation and compared to the actual version id to decide whether the update is based on an outdated version and thus should be refused. Afaik. this is best practice currently by MongoDB to avoid conflicts and it's probably used by other noSQL databases too, but it is not a database dependent solution. I guess the best solution would be to add this resource version in a Hydra extension vocab e.g. in a hext:resource-version property. I'd like to know if any Hydra related project support this or if you have any input about this.

alien-mcl commented 5 years ago

Hmm. I though that HTTP 409 conflict and ETag are are enough, or am I wrong? Each new version of the resource should generate a new ETag value, thus both client and servers can detect changed resource and decide on what to do.

ghost commented 5 years ago

@alien-mcl What if you send back PUT links for multiple resources versioned independently? As far as I understand you need a different Etag for each of those PUT links. Or can you deliver all of those Etags in a response header somehow?

angelo-v commented 5 years ago

You are talking about PUT requests against resources you got via a collection but have not dereferenced yet yourself and therefore don't have an ETag, yet? That is indeed an interesting question. I see two approaches:

1) Just PUT without ETag and accept to override 2) Make a GET or OPTION request before PUTting, to get the ETag.

I would not include the version (metadata) in the actual resource data.

alien-mcl commented 5 years ago

I don't think it is possible to provide Etags for multiple resources in headers. Alternative would be to provide some meta-data, but after @angelo-v I wouldn't do it - feels somehow unhealthy (but still it would be a fully legal solution). I'd change @angelo-v 's bullet no.1:

  1. Just PUT without Etag and expect 409 Conflict which should be resolved.
ghost commented 5 years ago

@angelo-v Let's say I have a collection: c=[i1,i2,i3,...] and the first item looks something like this: i1 = {someProperty: "x"}. Each item has a PUT link and has a separate etag than the collection itself. The first item has an etag: "a". While I figure out what text to choose to override the "x" value, somebody does the exact same thing and sets "someProperty" to "y". Because of this the etag of the first item changes to "b". I decided to instead of overriding "x" I rather append it with "z", so I send a PUT request with "xz" based on the "a" etag, when the resource already has the "b" etag. This is obviously a conflict which we normally avoid by send the "a" etag along with the PUT request. Now let's check the proposed solutions:

  1. "Just PUT without etag and accept to override" The API will override "someProperty" to "xz" and "y" is lost. Nobody will know that there was a conflict, so they won't be able to solve it and for example set our property to "yz".
  2. "Make a GET or OPTION request before PUTting, to get the etag."
    • If you meant that I should send a GET or OPTION request just to get the etag for my PUT request, then that means almost the same as the first solution, because in that case I will still send "xz" but with the "b" etag, which is a lie, because it is based on the "a" etag.
    • By the GET there is another option. I don't add the PUT link when I send back the representation of the collection, and I send an additional request for each item just to get that PUT link with the etag. That is suboptimal because I need 2 HTTP requests and it makes the client code more complicated, because I might get back "x" in my first request and "y" with etag "b" in my second request. Now I have to compare the returned values, probably replace "x", etc.

What if I have a bulk update PUT link I can use to update arbitrary numbers of items in the collection? If I want to update a dozen items at once, then that is 12 additonal HTTP requests. Another issue here that I have to send 12 etags somehow with the request. So not just the response, but the request etag header should carry multiple etags, which none of them is capable of.

I think it is obvious that the design of the ETag headers is broken and we have to send that info in the message body rather than in these headers...

alien-mcl commented 3 years ago

I gave it some thoughts and I feel that we need to clarify a few things. ReST is about representation of the resource, thus each resource behind an URL should be treated as it is - a resource, some kind of state.

When working with separate items of a given collection I think there are no issues here - each item will have it's URL and an ETag that should help you detect conflicts. Agent A edits a resource 1 with ETag X and Agent B edits same resource 1 with ETag X. Agent B updates a resource 1 which receives an ETag Y. Agent A tries to update a resource 1 but receives HTTP 409 and needs to react accordingly (override, merge, forfeit, whatever).

Working with whole collection at once makes things complicated as from resource point of view it is a coincidence that it is a collection. It is just another resource if a separate URL and ETag atteched to it. I think that server may choose to change that ETag when any of the items or the collection itself changes (i.e. order of the items in case of a list). It is also up to the server on what to provide when a collection URL is called - full representation or just a list of URL of the items or something in between.

There is no nice solution here. Pure ReST-like approach would be receive whole collection with it's dedicated ETag, modify and send it back - in case of an 409 agent should do something to resolve a conflict and that's all. But this approach may involve large payloads.

Other approach would be use some kind of PATCH-like approach, but this goes somehow outside of pure ReST and protocol knowledge and client/agent may need to understand more.

Another approach would be to use something like hydra to inform a client on how to proceed via links and operations available and some meta-data that are not provided by raw hydra (i.e. resource version mentioned). I believe there are vocabularies that provide these kind of meta-data (i.e. DCTerms) but this requires a client to understand even more.

asbjornu commented 3 years ago

HTTP has rich support for conditional requests and optimistic concurrency control through If-* headers. For the use-case you describe, @inf3rno, I'd say If-Match: <ETag> combined with 412 Precondition Failed and 428 Precondition Required is a good fit.

  1. For requests missing the If-Match header, respond with 428 Precondition Required. This ensures that all requests must be conditional.
  2. For requests that have an If-Match header with an ETag value different from the current state on the server, respond with 412 Precondition Failed.
  3. For requests with an If-Match header with an ETag matching the current state on the server, respond with 200 OK (or whatever fits your use-case).

I also recommend that all non-successful responses are served with an RFC 7807 compliant response body so the reason the request failed can be detailed. Hydra will provide compatibility with RFC 7807 (see #218).

With all of these tools available, I think it's safe to say that doing anything (more) in this area is out of scope for Hydra.

alien-mcl commented 3 years ago

While @asbjornu hints may not fully answer @inf3rno 's question, I agree with conclusion that there are solutions available already and we shall not take any additional steps to address this question within the core vocabulary. Please feel free to reopen this issue if there is something more that we can do here.