Open chrisd8088 opened 5 years ago
Another issue which we have identified as likely stemming from the use of flock(2)
for exclusive locking of inodes while checking their projection status is a significant slowdown when executing on a multi-core system.
Because the FUSE getattr()
file operation, in particular, is called repeatedly while the Linux VFS traverses each path in each syscall, and because libprojfs invokes flock(2)
on each call to getattr()
in order to check the projection status of the parent directory (in the call to project_dir()
), this results, we believe, in significant contention and effective serialization of parallel file operations.
Unfortunately, while we could switch to using LOCK_SH
instead of LOCK_EX
for the common fast-path case where the directory has already been projected, we can't know that in advance, and there is no race-free way to upgrade to an exclusive lock in the case that we need to perform a projection. It might be acceptable to have a race here, though; if, when we do acquire the exclusive lock, the projection has already happened, we can simply unlock and return.
An internal locking system which could be combined with the FUSE low-level API, so that locks were held just in memory in custom per-inode structures, would probably yield the most benefit, however, as it would solve both types of contention issues -- with clients such as .NET that also use flock()
, and between threads when calling getattr()
and other FUSE file operations.
Our current implementation uses
flock(2)
to attempt to acquire an (advisory) exclusive lock on an inode while determining its projection state; however, as noted, this may interfere with clients' independent use offlock(2)
.The VFSForGit application and .NET Core are an example of such a client; in particular, the
UpdatePlaceholderTests
functional tests are currently failing because they acquire file handles using theFileShare.Delete
mode, which uses aflock()
mode ofLOCK_SH
. TheGitIndexProjection.UpdateOrDeleteFilePlaceholder()
method then invokes theDeleteFile()
method of the LinuxVirtualizationInstance
, which attempts a simpleFile.Delete()
but receivesEAGAIN
from libprojfs's failedflock(2)
call usingLOCK_EX | LOCK_NB
.Replacing the use of
flock(2)
with an internal inode-to-lock mapping in libprojfs would alleviate this type of contention for locks with all clients, including VFSForGit.Note that lock contention of this type may also be the reason we have to handle
EAGAIN
return codes from attempts to write to files in other places in the VFSForGit functional test suite; see, for example, (https://github.com/github/VFSForGit/commit/88cc76a56d6fd13bfdbffaf651289308dbb98c57).It may also be the cause of the apparently transient failures sometimes encountered with the
GitCommands.StatusTests.CreateFileWithoutClose()
andGitCommands.StatusTests.WriteWithoutClose()
tests, which have been seen to reportResource temporarily unavailable
(i.e., anEAGAIN
error code) while attempting to clean up their test files. For example: