brimdata / zed

A novel data lake based on super-structured data
https://zed.brimdata.io/
BSD 3-Clause "New" or "Revised" License
1.35k stars 67 forks source link

add locking for lake branch merge operations #2971

Open mccanne opened 2 years ago

mccanne commented 2 years ago

Currently, the code to merge branches has races if multiple processes try to merge the same branch at the same time.

While the transaction journal provides opportunistic locking within a single journal, the mechanism does not generalize across journals (i.e., you can't opportunistically write to two journals in a fashion that provides consistency across the two writes).

This can be fixed by implementing explicit locking for the merge operation. The proposal here is to create a merge-lock journal in each branch that implements a simple lock/unlock protocol. You take a lock by reading HEAD, which must have an unlock, and write a lock at HEAD+1. Now you have the lock. You do the merge operation and you release the lock. Because locks are persistently stored in the merge-lock journal, crash recovery can easily look at this journal to recover and release locks that get stuck.

Note that the lock need only be held on the child branch. This is because the merge to the parent is atomic and doesn't affect the child as long as the child's TAIL/BASE pointer points to the old point in the parent. Since TAIL/BASE is moved atomically (they are kept in the same cloud object), this atomic update will simultaneously move to the correct parent BASE point and move the child's TAIL to the new point in the child over the merged transactions that are now in the parent. Thus, writes and queries can run on the child branch without any problem concurrent with the merge operation. This is an important attribute of this design enabling a use case of non-stop live ingest into a branch while merges to parent happen concurrently.

If a write conflict occurs while merging to parent, then the merge fails and the child updates can be easily aborted. Such conflicts would never occur in the live ingest use case as new data is only ever added. They will occur if the branch "deletes" data in the parent, then it is up to the application implementing such a workflow to ensure that conflicts do not occur and/or figure out how to recover when they do occur.

mccanne commented 2 years ago

When we add locks, we should create a meta-query source for the locks in branches so we can easily introspect the status of locks and find stuck locks. Obviously, we'll need a command to force-release a lock manually.