errorception / redis-lock

Implements a locking primitive using redis. An implementation of the algorithm described at https://redis.io/commands/setnx
211 stars 48 forks source link

Concurrency Bug: Lost locks and timeouts #3

Closed nickminutello closed 11 years ago

nickminutello commented 11 years ago

I shall try to explain by annotating the code with some comments:

function acquireLock(client, lockString, lockTimeout, lockAcquired) {
    function retry() {
        acquireLock(client, lockString, lockTimeout, lockAcquired);
    }

    var lockTimeoutValue = (Date.now() + lockTimeout + 1);

    client.setnx(lockString, lockTimeoutValue, function(err, result) {
        if(err) return process.nextTick(retry);
        if(result === 0) {
            client.get(lockString, function(err, timeStamp) {
                // timeStamp is null - because between the client.setnx (that failed) and the client.get, some other client has released (client.del) the lock
                if(err) return process.nextTick(retry);

                // We end up with NaN here parsing a null
                timeStamp = parseFloat(timeStamp);

                // This comparison returns false - so we think the lock may have timed out
                if(timeStamp > Date.now()) {
                    setTimeout(retry, 50);
                } else {
                    lockTimeoutValue = (Date.now() + lockTimeout + 1)
                    // We try and take the lock - and succeed
                    client.getset(lockString, lockTimeoutValue, function(err, result) {
                        if(err) return process.nextTick(retry);

                        /// BUT THIS COMPARISON FAILS. We successfully obtained the lock - but we dont realise, 
                        // so we retry ... now the lock is left hanging until it times out. We should instead compare the string 
                        // values - rather than trying to compare the parsed values -OR - we should handle a null timestamp by immediately retrying.
                        if(parseFloat(result) === timeStamp) {
                            lockAcquired(lockTimeoutValue);
                        } else {
                            retry();
                        }
                    });
                }
            });
        } else {
            lockAcquired(lockTimeoutValue);
        }
    });
}
rakeshpai commented 11 years ago

For some reason, Github didn't give me notifications about your bug report, so I apologize for the delay in responding. Also, luckily for me, I also got a pull request to fix just this bug. I've merged it in now. The issue is #4.

nickminutello commented 11 years ago

I just noticed... we neednt retry if the lock has been deleted :

    if(!existingLockTimestamp) {
        // Wait, the lock doesn't exist!
        // Someone must have called .del after we called .setnx but before .get.
        // https://github.com/errorception/redis-lock/pull/4
        return setTimeout(retry, 50);
    }

We just need to move to the next step of trying to obtain the lock with a .getset (but make sure we have the right value to check if we got it)

rakeshpai commented 11 years ago

What would be the right value to check? If we acquire the lock with .getset, we won't have existingLockTimestamp set, so we won't know if we got the lock. .setnx would be the right command to use in such a scenario, hence the retry.

Or am I missing something?