baudehlo / node-fs-ext

Extras missing from node's fs module
MIT License
101 stars 45 forks source link

More than 4 exclusive flocks on the same file do not acquire more than the first lock #49

Closed pipobscure closed 9 years ago

pipobscure commented 9 years ago

When calling flock to lock a single file exclusively but having distinct fd's more than 4 time only the first lock succeeds.

Test Code:

var count = 4;
var fs = require('fs');
var fsext = require('fs-ext');
var all=[];
while(all.length < count) all.push(fs.openSync('test.lock', 'w'));
console.log('FDS: ', all);
all.forEach(function(fd, idx) {
  fsext.flock(fd, 'ex', function() {
    console.log('Locked('+fd+'): '+idx);
    setTimeout(function() {
      console.log('Try-Unlock('+fd+'): '+idx);
      fsext.flock(fd, 'un', function(err) {
        console.log(err && (err.message || err) || ('Unlocked('+fd+'): '+idx));
      });
    }, 1000);
  });
});

this works if count is 4 or less, but once count hits 5 it stops working:

Output for count=4:

FDS:  [ 10, 11, 12, 13 ]
Locked(10): 0
Try-Unlock(10): 0
Unlocked(10): 0
Locked(11): 1
Try-Unlock(11): 1
Unlocked(11): 1
Locked(12): 2
Try-Unlock(12): 2
Unlocked(12): 2
Locked(13): 3
Try-Unlock(13): 3
Unlocked(13): 3

Output for count=5:

FDS:  [ 10, 11, 12, 13, 14 ]
Locked(10): 0
Try-Unlock(10): 0

after which it hangs, because there are open files, but the second lock is never acquired.

Just wanted to note this bug for now, I'll try to look into what's happening on the C++ level when I have some more time.

pipobscure commented 9 years ago

Ok, I found it.

libuv has the number of worker_threads that a loop has set to 4. This leads to only 4 blocking operations being possible at the same time in a loop. When there are 5 calls to flock, then the first succeeds, and the next 4 fill up the worker threads and block them all. Even though flock('un') is then called, that never happens since all workers are blocked.

At that point there is no way to get out of that stale-mate. You have a dead-lock. Now this is usually not a problem, since most calls will eventually return. However since flock depends on a resource becoming available that cannot ever become available any more, flock is problematic in this context.

The solutions could be:

  1. Do not use a loop to do this, but use the threading features provided by libuv directly to create a new thread per flock request. Of course this has it's limits as well, since threads are a limited resource.
  2. Provide logic in fs-ext that ensures that there are never more than 3 locking executions in flight at a time. This would at least prevent dead-lock, and give the opportunity for an unlock to happen.
  3. Work up-stream to automatically add/remove worker threads as needed. This could keep the default at 4, but still allow for escalation if they are actually all blocked.

Thoughts?

rlidwka commented 9 years ago

The solution is to avoid blocking libuv loop. Which means use exnb flag instead of ex like I did here.

pipobscure commented 9 years ago

Once I found the root cause, I was aware of that solution. Thanks for noting it down for posterity. I think this issue can only be fixed by documentation. FS-Ext is giving you a mighty big gun to shoot yourself in the foot.

I can already envision an app using 3 different libraries that use blocking locks. And then sometimes, when the timing is just right, the app suddenly stops working.

I guess that just means a heightened need to code-audit any module that has fs-ext in its dependency tree. :)