RubenVerborgh / solid-server-architecture

Proposed architecture for a Solid server
https://rubenverborgh.github.io/solid-server-architecture/solid-architecture-v1-3-0.pdf
13 stars 2 forks source link

Support for conditional requests #21

Closed michielbdejong closed 5 years ago

michielbdejong commented 5 years ago

As discussed, we need a way to ensure that depending code has a way to do a sequence of ResourceStore#getRepresentation + ResourceStore#setRepresentation without the underlying resource having changed in the meantime. There are two ways to do this:

getResourceTransaction

The way this would work is:

interface ResourceStore {
  getResourceTransaction(ResourceIdentifier): Promise<ResourceTransaction>
}
interface ResourceTransaction {
  getRepresentation(RepresentationPreferences) : Promise<Representation>
  addResource(Representation) : Promise<ResourceIdentifier>
  setRepresentation(Representation) : Promise<void>
  deleteResource() : Promise<void>
  modifyResource(Patch) : Promise<void>
}

ResourceLocker

As described in https://rubenverborgh.github.io/solid-server-architecture/store-atomicity.pdf Questions about this (from Michiel to Ruben):

for each CRUD operation, only one dedicated method needs to be called. It is up to the implementer of the interface to (not) make an implementation atomic. For some implementations, such as triple stores or other database back-ends, atomicity is a given [...] a read+append sequence could unknowingly be interrupted by a write that thereby breaks atomicity.

It seems that the first paragraph talks about thread-safety of each of the 5 methods, and the second paragraph talks about locks for sequences of calls to these methods. Aren't those two different things?

We could explicitly indicate atomicity by having such implementations implement the (otherwise empty) AtomicResourceStore interface as a tag

An implementation where the methods are not thread-safe sounds useless to me. For instance, if you do getRepresentation while in another thread someone is changing the resource body from "Hola mundo" to "Hallo wereld", you could end up reading a "torn" body like "Hola mwereld". So that's different from the problem we're trying to solve with transactions for sequences, and I think that thread safety is such a basic feature that we don't need to include NonAtomicResourceStore in our model. We can just call that BuggyResourceStore. :)

Such non-atomic stores could be made atomic by decorating them with a LockingResourceStore

That is talking about making AtomicResourceStore instances (that are already thread-safe for individual CRUD operations) into transaction-aware resource stores by adding locks that can act across sequences of such CRUD operations.

RubenVerborgh commented 5 years ago

we need a way to ensure that depending code has a way to do a sequence of ResourceStore#getRepresentation + ResourceStore#setRepresentation

To be precise, we need a way to ensure that we can support conditional HTTP requests.

Doing such a sequence is one possible solution (but not necessarily the most efficient one, as discussed in the store atomicity document).

It seems that the first paragraph talks about thread-safety of each of the 5 methods, and the second paragraph talks about locks for sequences of calls to these methods. Aren't those two different things?

They are both about event-loop and thread safety actually. Even single-threaded Node.js apps do not guarantee order of execution in presence of asynchronous operations. Locks are one mechanism to provide event loop / thread safety.

The first paragraph is about the internal implementation of the methods: are they atomic/thread-safe themselves? The second paragraph talks about the atomicity of callers of these methods.

An implementation where the methods are not thread-safe sounds useless to me.

They are useful if you have an independent mechanism to take any non-atomic implementation and make it atomic, which is what LockingResourceStore does.

So if a class cannot easily provide atomicity itself, it shouldn't bother. Another component (that has already been tested for this) will do it for them. But if classes can provide atomicity themselves, they can indicate that they do not need another atomicity mechanism. It's a win-win.

For instance, if you do getRepresentation while in another thread someone is changing the resource body from "Hola mundo" to "Hallo wereld", you could end up reading a "torn" body like "Hola mwereld".

So that will never happen if Operation (which calls the store) requires AtomicResourceStore instead of just ResourceStore. Which is why it's a good idea to have this tagging interface.

That is talking about making AtomicResourceStore instances (that are already thread-safe for individual CRUD operations) into transaction-aware resource stores

No, it's taking a non-thread-safe store and making it thread-safe.

michielbdejong commented 5 years ago

Note that the difference between calling ResourceStore#getResourceTransaction and calling ResourceStore#locks.acquire is just syntactic sugar. The difference between the two proposals is in releasing the lock.

The downside of using locks is that the depending code will probably expect them to be exclusive, and to ensure that until it releases the lock, it has control over the resource, i.e. its operations will not fail due to conflict and other threads will be unable to write to the resource.

A lot has been written about locking and optimistic locking, and I think the best choice here would be an optimistic lock. Imagine thousands of operations per second doing conditional PUTs on the same resource. They would all queue up to briefly obtain the lock, check their condition, and release the lock again. Even if all conditions return false, the requests would queue up. Optimistic locking feels like an odd fit here - in fact I Googled it and found a blog post that says exactly that:

Typically distributed systems are stateless, thus the pessimistic solution does not really fit. One reason for this is that most distributed systems use REST or at least HTTP for communication – which itself is already stateless. Therefore the optimistic concurrency control is the preferred approach. (https://www.novatec-gmbh.de/en/blog/managing-concurrency-in-a-distributed-restful-environment-with-spring-boot-and-angular2/)

michielbdejong commented 5 years ago

Every ResourceStore implementation must be transaction-aware (or at least lock-aware), which is not the case for back-ends such as files.

On page 2, I think that's a nice example of where the 'everything is a ResourceStore' philosophy breaks down a little bit. If many of the components in our code get the label 'ResourceStore' on them, and some components have transaction awareness as an essential feature of their function within the code, while others do not, then that interface becomes a straitjacket. We could say that an Operation requires not just any ResourceStore, but specifically on a TransactionAwareResourceStore, to do its job.

RubenVerborgh commented 5 years ago

Note that the difference between calling ResourceStore#getResourceTransaction and calling ResourceStore#locks.acquire is just syntactic sugar.

It's not: they serve entirely different purposes.

getResourceTransaction is exposed externally, is designed to work across methods, and moves the locus of control for atomicity to the caller.

Note that there is strictly no ResourceStore#locks.acquire; ResourceStore does not expose a lock in that scenario. Rather, the exposed interface stays the same. A ResourceStore implementation might be wrapped in a LockingResourceStore, which is a generic component responsible for guaranteeing the atomicity within methods, for those stores that do not already have method atomicity themselves.

The downside of using locks is that the depending code will probably expect them to be exclusive

They need to be exclusive, yes.

I think the best choice here would be an optimistic lock

Agreed, but it also depends on how cheap or expensive locking is. (LockingResourceStore allows for different ResourceLocker implementations.)

Even if all conditions return false, the requests would queue up.

True; worst case, they can time out.

RubenVerborgh commented 5 years ago

If many of the components in our code get the label 'ResourceStore' on them, and some components have transaction awareness as an essential feature of their function within the code, while others do not, then that interface becomes a straitjacket.

Hence AtomicResourceStore.

We could say that an Operation requires not just any ResourceStore, but specifically on a TransactionAwareResourceStore, to do its job.

Indeed, as I wrote in https://github.com/RubenVerborgh/solid-server-architecture/issues/21#issuecomment-520730153.

michielbdejong commented 5 years ago

I would vote for "Approach 1 to conditional requests: transactions", but with optimistic instead of pessimistic locking (so no need for a release method). See also https://github.com/inrupt/wac-ldp/issues/46

michielbdejong commented 5 years ago

Note that with optimistic locking, there are two places where a conflict might be detected: at "3. check the conditions" and at "4. if the conditions are satisfied, call the modification method", if the resource was written to since the optimistic lock was acquired. See also the 'check and set' section in https://redis.io/topics/transactions for an example of how Redis supports this nicely.

michielbdejong commented 5 years ago

no need for a release method

I guess that even with optimistic watches instead of pessimistic locks, for for instance in-memory watches in a Data Access Object in front of a filesystem, it would be useful to have release so that the list of watches doesn't grow as the server keeps running.

michielbdejong commented 5 years ago

Having said that, if you want pessimistic locking, then we'll build it! :)

RubenVerborgh commented 5 years ago

it would be useful to have release

You need it. At some point, you have to indicate that you are done with a lock, optimistic or not.

I would vote for "Approach 1 to conditional requests: transactions", but with optimistic instead of pessimistic locking

Could you argue your choice, and specifically counter the severity of the requirements and caveats listed on https://rubenverborgh.github.io/solid-server-architecture/store-atomicity.pdf#page=2 ?

Some of those seem prohibitive.

if you want pessimistic locking

I don't. But unfortunately, Approach 1 is the only one that does not support optimistic locking, because it only lets you call createTransaction.

And sure, we could work around that. But that makes it a different (and likely more complex) approach. And the complexity stems from the fact that another component (namely the caller) takes control over the locking mechanism and atomicity.

Note that with optimistic locking, there are two places where a conflict might be detected: at "3. check the conditions" and at "4.

Exactly, so this means that, with Approach 1, getRepresentation would need to be called twice, so it becomes even less efficient.

michielbdejong commented 5 years ago

Sorry, hadn't read your Slack remark yet when I wrote the above.

Preliminary conclusion: there seem to be fewest caveats with the third approach.

That works for me. Especially since it allows us to keep the whole ETags thing close together and have one instead of two places where conflict can be detected.

michielbdejong commented 5 years ago

Doing it this way is also a nice fit with how updateResource(ResourceIdentifier, Patch) is already inside the ResourceStore.

RubenVerborgh commented 5 years ago

Thanks for all feedback @michielbdejong, updated main diagram following.

RubenVerborgh commented 5 years ago

Added in 0e1cc120c8ea0da754a7a649cef7aaa4f2f27273 to https://rubenverborgh.github.io/solid-server-architecture/solid-architecture-v1-2-0.pdf