Transactional updates to DatArchive files

pfrazee commented 7 years ago

We currently have no solution to the "interceding update" race condition. An example from wikipedia:

Process A reads a customer record from a file containing account information, including the customer's account balance and phone number.

Process B now reads the same record from the same file so it has its own copy.

Process A changes the account balance in its copy of the customer record and writes the record back to the file.

Process B, which still has the original stale value for the account balance in its copy of the customer record, updates the account balance and writes the customer record back to the file.

Process B has now written its stale account-balance value to the file, causing the changes made by process A to be lost.

The same scenario can occur to us, but for multiple tabs and for multiple writers, once multi-writer is implemented.

Handling multiple tabs

Multiple tabs are relatively straightforward to coordinate, because Beaker's main process acts as a single-threaded manager. We can employ mutexes (locks) or compare-and-swap, and in fact CAS would use a mutex internally, and so a mutex API may be the simplest solution. Something like this:

var archive = new DatArchive(...)
await archive.lock('/file.json')
var obj = JSON.parse(await archive.readFile('/file.json', 'utf8'))
obj.foo = 'bar'
await archive.writeFile('/file.json', JSON.stringify(obj))
archive.unlock('/file.json')

We could also attach this conceptually to the commit() and revert() methods, to make it feel like a transaction:

var archive = new DatArchive(...)
await archive.startTransaction() // full archive lock
var obj = JSON.parse(await archive.readFile('/file.json', 'utf8'))
obj.foo = 'bar'
await archive.writeFile('/file.json', JSON.stringify(obj))
archive.commit() // release lock and publish changes
// or
archive.revert() // release lock and roll back any unpublished changes
// or
archive.endTransaction() // release lock

The downside of ^ that approach is that commit() and revert() will affect more than the current transaction, which could really confuse people. An example being this:

var archive = new DatArchive(...)
await archive.writeFile('/file_two.json', JSON.stringify(obj2))
await archive.startTransaction() // full archive lock
await archive.writeFile('/file.json', JSON.stringify(obj))
archive.commit() // would publish file_two.json and file.json

So, combining the locks with commit and revert may not be a good idea, or may require some additional thought.

That said, none of these ideas may work at all, because...

Handling multiple writers

When multi-writer lands, this situation will get a little more complex.

There's no way to enforce a mutex or any kind of atomicity when multi-writer is involved. Multi-writer in Dat is an eventually-consistent protocol, which is defined for devices/processes that are not in close communication. That means it's designed to expect conflicts, and, when they occur, it informs the app, so that it can resolve the conflict by some kind of merge.

We haven't yet decided what the API will be for detecting and resolving conflicts, but it might be very strange to have a locking system and a conflict system at the same time. Why even bother with locks and transactions when you can still have conflicts? So, instead of handling "interceding update races" between tabs with locks, we might consider tying into the conflict-resolution system that multi-writer provides.

I'll update this issue as this topic develops. Feel free to add your thoughts.

aschrijver commented 7 years ago

Hi @pfrazee

As we talked about on freenode I would be very careful modeling ACID behaviour if you even have the slightest bit of eventual concurrency in your system (PS I don't know your use case, just reading this text).

In your first example that would be the case if, say, you have other logic that writes to your archive besides tabs. If there's just the tabs and single-user, single-device you're probably okay with locking for a while.

But certainly not in the multi writer scenario. Using locks now is asking for trouble. We talked about event sourcing for handling conflicts, and I intend to say something about that with relation to Dat on this discussion thread: https://github.com/datproject/dat/issues/824

However, I think what the guys at Replikativ are busy with is more what you are looking for. You should probably talk to @whilo about CRDTs!

pmario commented 7 years ago

I think in "Handling multiple writers" you answered your question already.

I think it will be good, if there are some notification events, that tell an app, that dat has detected inconsistencies. ... but the resolution has to be done on the app level, if it can't be automatically resolved.

pfrazee commented 7 years ago

I think in "Handling multiple writers" you answered your question already.

It's a bit more complicated than that.

When there are multiple writers in the network, they coordinate using a CRDT to avoid losing data. When there are multiple tabs/processes, there is no CRDT. So, we need a local coordination system for the tabs. Either we add a CRDT, or we use transactions.

pmario commented 7 years ago

So, we need a local coordination system for the tabs. Either we add a CRDT, or we use transactions.

IMO it depends, what you want. For me it would be enough, if the second tab or window just tells me, that a writeable dat is already opened somewhere else. So the "conflicting" window/tab can be closed.

Browsers have the "Broadcast channel" API, which can be used to communicate between tabs and windows. So on the app level, there are already possibilities. .. So I don't see the need on the beaker API level (except to improve the existing mechanisms)

If the content is modified, we already have .createFileActivityStream() to notify an app, about changes. As we know this mechanism has issues. ... So if the new functionality can improve the possibilities here, I'm in favor of changes.

pmario commented 7 years ago

IMO transactions that you describe, don't solve the "interceding update" race condition from the OP.

It assumes, that beaker controls process A and process B. Which isn't always the case. eg: Changes can be introduces from the OS file level. ... They don't care about any locks. ..

hossameldeen commented 6 years ago

I've met the problem of writing to one dat archive from multiple tabs and the solution I've reached till now is using a SharedWorker for dealing with the dat archive. Haven't implemented the solution yet, so can't tell for sure how far it'll go or work out.

And hadn't thought of changes from the filesystem @pmario mentioned, so probably SharedWorker's solution won't work except under the assumption that changes only come from one of the tabs.

hossameldeen commented 6 years ago

@ myself: I found that DatArchive API isn't available for Workers, so my solution is probably not valid for now. Of course, there's the work-around of having any tab communicating with the SharedWorker to act as a proxy for the DatArchive API, but didn't try that.

Follow #932 for updates on using DatArchive inside a Worker.

pfrazee commented 6 years ago

I think there are some userland implementations of cross-tab locks worth looking into. I also so that Chrome is shipping a locks API soon.

hossameldeen commented 6 years ago

Thanks, will look into them!

beakerbrowser / beaker

Transactional updates to DatArchive files #606

Handling multiple tabs

Handling multiple writers