Fencing - Githubissues

fernomac commented 7 years ago

Cool to see pieces of this being open sourced!

You mention clock skew in the readme, but there is no discussion of "skew" in the sense of a client acquiring a lock, getting stuck doing garbage collection (or whatever), then proceeding to take an action believing that it holds a lock which has in fact expired and been acquired by someone else. Does this version of the library support something like fencing to allow downstream systems to detect and reject out-of-order operations?

amcp commented 7 years ago

@fernomac perhaps "clock synchronization" may be a better section heading. I'll cook up a PR to address this clarification. Regarding your question about fencing, downstream users of the client will reject out of order operations. The client first uses GetItem to read the lock, and if it exists as an item in the table, it waits for the lease duration in the lock and then does a conditional PutItem call. The condition on the PutItem makes sure that the owner and RVN didn't change between reading the lock and updating the lock. In terms of prior latent owners of the lock (eg, GC slowed the previous owner to a halt), RVN and owner not changing means that the prior latent owner did not heartbeat and update the RVN.

RVN is a UUID at level 4. If the prior owner heartbeats successfully, consecutive RVNs should not be the same. To sum up, we serialize access on owner+RVN, so out-of-order operations get rejected.

fernomac commented 7 years ago

Thanks for the response!

The case of a lock holder not heartbeating in time is the one I'm interested in. Say I'm using the locks to serialize access to some other DynamoDB table. Client A acquires the lock, reads the current data in the table, ensures it still has the lock, then goes to sleep for a long time. Client B comes along, thinks the lock is stale, forcibly acquires it, and makes a change. Client A wakes up, thinks it still has the lock, and clobbers B's change. This is a fundamental challenge of distributed locking systems.

You can work around this to a degree by setting your timeouts high enough that A probably won't ever sleep for that long; that has a history of biting folks (me included) in the rear though. :)

amcp commented 7 years ago

When client A is restarted and calls the AcquireLock API, the first thing that happens is that it calls GetItem to read the lock. It will realize B stole the lock, and will try to acquire it again by waiting lease duration milliseconds and doing a conditional put. So, client A will not clobber changes made by B. This is by design; allowing locks without heartbeats to be reclaimed by healthy hosts is what allows users of the client to make progress as a whole.

fernomac commented 7 years ago

Client A won't call the AcquireLock API when it wakes up -- it "just" finished doing so before it started GC. Totally agree about requiring a way to "steal" locks, but it opens you up to problems when the client you're stealing it from is not dead, just sleeping.

sasha-slutsker commented 7 years ago

Yes, what you mention is a fundamental challenge that makes this kind of thing tough :). First, the lock client provides two basic methods that help but do not fully solve the issue you mention: ensure and isExpired. Ensure will guarantee that you have the lock for at least X ms, at the expense of other clients taking longer to steal the lock. So for example, if you call "Ensure" and pass in 10 seconds, you can guarantee that no one will steal the lock for at least 10 seconds. You could call this in a loop before doing work; if the work is expected to take, say, 1 second every iteration. The method "IsExpired" is similar, except it will return true or false, never extending the lock's leaseDuration even if it's about to expire. (In general, we recommend you use Ensure if you turn heartbeats off and IsExpired if you have automated heartbeating on, but please consult the JavaDoc for full details on both methods.)

Finally, I want to mention one more feature we have that might be of interest to you: session monitoring! The lock client provides an optional ability to attach a SessionMonitor to your AcquireLock request. When using the SessionMonitor, the lock client will call a user-supplied "callback" method when the lock enters the "danger zone" of expiring soon. This could, for example, kill the JVM or do any number of other "emergency" actions. Depending on your usecase, this can be a valuable way of avoiding the scenario you describe.

With things like garbage collection, it can be tough to ensure that you never have an issue 100% of the time. But using these techniques can help to make your system more robust.

fernomac commented 7 years ago

Cool, thanks. And again, it's super cool to see you open sourcing this.

Would you be open to a PR adding support for and documentation about fencing, as described in the link I posted above or in the Chubby paper? That (plus support from the system being modified under the lock) would allow you to make hard guarantees even in the face of arbitrary pauses.

amcp commented 7 years ago

@fernomac Sequence of events:

Client A locks for a certain lease duration
Client A GC pauses
Lease duration passes
Client B steals lock for a certain lease duration with a different RVN and hostname.
Client A wakes up from GC pause
Client A works on the critical section guarded by this lock client's AcquireLock API.

You are talking about step 6, right?

amcp commented 7 years ago

After you acquire a lock you can get the latest RVN from the lock object, right? Suppose you store your locked entities in this lock table item (the lock table is your entity table and the lock keys are the entity keys). If your data is in the locks, you can do an UpdateItem conditioned on hostname and current RVN to update the data in the lock. If Client B steals the lock from sleeping Client A, the RVN will not match so the update will fail and your application will see an Exception being thrown.

Now imagine the entities you are serializing access to with the lock are not stored in the lock itself but in multiple other tables. If you store a xyz_transaction_sequence_number attribute with each entity and in the lock itself, you can condition on them not changing later. If you condition updates in the critical section to guarded entities on the xyz_transaction_sequence_number you can detect whether some other host effected a change meanwhile.

Client A reads all the entities being locked and infers the current xyz_transaction_sequence_number for the lock.
Client A acquires a lock for a certain lease duration, increasing the xyz_transaction_sequence_number by one in the lock
Client A GC pauses
Lease duration passes
Client B reads the entities, steals lock for a certain lease duration with a different xyz_transaction_sequence_number and completes the critical section. (A updated seq to 27 and paused, B updated to 28 and completed)
Client A wakes up from GC pause
Client A works on the critical section guarded by this lock client's AcquireLock API. Because the xyz_transaction_sequence_number for all the items will have changed by client B, the first write to a guarded entity will fail.

In case Client A GC pauses in the middle of the critical section and subsequently loses the lock to B, B can detect the incomplete critical section in step 1 and remedy as appropriate before embarking on the critical section anew.

fernomac commented 7 years ago

@amcp yep, exactly: the sequence number in the lock allows downstream systems to detect and reject out of order requests. If a request from lock-holder 27 arrives after a request from lock-holder 28, it because 27 no longer holds the lock.

Full-on transactions take more work, and folks looking for that should probably use the dynamodb-transactions library; this library is lighter-weight for when those guarantees aren't needed, and works even if the place you're making changes isn't DynamoDB.

Thanks for discussing this with me! If you're interested, the offer to send a PR adding (opt-in?) support for a sequence number on the lock that can be used for fencing and some documentation about when you might want to use it stands, or if you think this thread serves as good enough documentation of the issue feel free to close it.

fahedhijazi commented 7 years ago

@amcp What if rather than using a UUID for RVN, we used a monotonically increasing sequence number? I believe either approach achieves the same result, with the added benefit that the sequence number can be used for ensuring that the latest writer always wins as well as not having to have an additional column. Thoughts?

The second remark is that some data stores do not support conditional writes, so this should be used despairingly. If you're writing to S3 for example, I don't think that there is a way to ensure that you are the latest client modifying a resource. Though ensure and isExpired can be useful to mitigate such scenarios, it still leaves open the opportunity for race conditions.

amcp commented 7 years ago

@fahedhijazi Using a monotonically increasing RVN instead of a UUID makes the RVN predictable. Currently, all acquireLock calls attempt to read the lock and will wait one lease duration before trying to acquire with a conditional put. If the same host has multiple processes contending for the same lock, using an unpredictable UUID for RVN forces processes on the same host contending for the same lock to serialize on the lock (because the random number generator will produce different UUID on each process).

awslabs / amazon-dynamodb-lock-client

Fencing #1