Closed ppisar closed 3 years ago
Wasn't initially able to reproduce this, though I do agree there is a race here.
https://github.com/PerlFFI/FFI-Platypus/pull/348
Implements a retry mode, which I think would be the most appropriate fix. Using proper file system locks would almost certainly be the most reliable option, but not all filesystems implement them properly or at all.
There still is a race because we only retry a finite number of times in this PR. The likelihood of a fail is reduced with even a few retries. I don't think there is a reliable way to know why File::Temp
failed, otherwise I could key off that.
I am releasing this as dev version FFI-Platypus-1.54_01.tar.gz
, I appreciate your feedback on this since I wasn't able to quickly reproduce this race myself.
Thanks for the quick fix. I'm also unable to reproduce it at will. I tested FFI-Platypus-1.54_01 and it looks good so far.
I'm going to close this for now, but we can re-open if it pops up again.
I observed a test failure like this:
I perform FFI-Platypus tests in parallel and that mean that a test can perform FFI::Temp->newdir() while another test performs FFI::Temp END block. There seems to be a race between creating and deleting ./tmp directory:
A process enters File::Temp::_root() and observes the directory already exists and is paused by a CPU scheduler right after the check for the directory, but before creating a lock file:
Then another process enters END block, deletes all lock files, attempt delete the directory, that succeeds (becaus it is empty) and returns back:
Finally, the first process continues, it attempts to create a lock file, it fails (because the directory does not exist) unnoticed, register the lock file and returns back to FFI::Temp->newdir() which then calls File::Temp->newdir() which reports the missing ./tmp directory specified as DIR argument:
I could patch the tests to run serially, but that does not fix the problem in the FFI::Temp library which can happen in any FFI::Platypus user.
An immediate fix is to raise an error on close() failure in _root(). That will prevent from the race, but it won't be robust enough for a smooth user experience. Applications would have to retry FFI::Platypus. Modern operating systems provides a way of creating files relative to a opened directory descriptor. But that would require File::Temp to use it. Once could maybe use file locks as synchronization primitive. And maybe the easiest and portable approach would be retry creating the directory if creating the file lock fails because of a missing directory. With a loop inside _root().