brunoczim / fslock

File locking for Rust.
MIT License
40 stars 11 forks source link

Lock likely being released on close of other FDs (Unix) #6

Open brunoczim opened 2 years ago

brunoczim commented 2 years ago

Test other_process_but_curr_reads fails.

It attempts to read the locked file into a string (in the same process, with std::fs::read_to_string) and then spawns another process to make it try to lock the file. It would be expected that the process would fail to lock, but it actually succeeds.

I've made other tests, and without opening the file again and reading it, the lock works. I believe the bug is not in reading nor opening, but closing the file.

Manual page for lock seems to specify that it is implemented on top of fnctl on Linux. Also, manual page for close seems to specify that close will revoke any locks via fnctl on the file, no matter if a different FD to that file was closed.

brunoczim commented 2 years ago

It seems like we could have some control over other LockFiles, but not on other APIs, such as std::fs.

brunoczim commented 2 years ago

Perhaps we could use flock?

The problem is that is seems to be non-POSIX. However, libc exports it unconditionally.

brunoczim commented 2 years ago

I'm likely publishing a new version of this library with flock instead of lockf. However, I'd like to mature this idea before.

Is 'flock` not-by-process in every platform? Or is it just Linux and MacOS?

Does libc supported platforms really support flock?

soulmachine commented 2 years ago

lockf is POSIX compliant and works on NFS, see https://gavv.github.io/articles/file-locks/#lockf-function

brunoczim commented 2 years ago

Yeah, but lockf has this "per-process issue", where I lock the file "foo" with one FD and loses it because someone else in the same process closed the same file but with another (unrelated) FD.

I intend to replace lockf by flock because flock does not seem to have such problem.

What makes me think twice about flock is that it's not POSIX, but rather implemented by quite a few Unix-like systems (like Linux, BSDs, MacOS, etc).

But yeah as I said earlier, libc crate exposes flock unconditionally for Unix targets. So, I believe flock is supported by all Unix targets supported by Rust. I am likely using it.

polarathene commented 2 years ago

I don't know this stuff too well to contribute much to the discussion, but while searching for Rust crates I did come across a resource on the topic that might be helpful for this issue/decision.

Maybe you've already seen it? :sweatsmile: (EDIT: I'm only now looking over the earlier link (https://gavv.github.io/articles/file-locks/#lockf-function) and noticing it covers similar details pretty well, so the information below probably doesn't contribute much, apologies!_)


Expand for previous content [This Dec 2010 article](https://apenwarr.ca/log/20101213) (_with 2014 and 2015/2019 updates_) details 3 types of locking and their pro/cons as well as cross-platform issues (_some since resolved_): > Other than simple _lockfiles_, which I won't go into (_but which you might just want to use from now on, after reading this article :)_), there are three Unix file locking APIs: `flock()`, `fcntl()`, and `lockf()`. > POSIX is also, apparently, unclear on whether `lockf()` locks are the same thing as `fcntl()` locks or not. > On Linux and MacOS, they are documented to be the same thing. (_In fact, `lockf()` is a libc call on Linux, not a system call, so we can assume it makes the same system calls as `fcntl()`._) > **I recommend you avoid `lockf()`.** > `fcntl()` locks are much more powerful than `flock()` locks. > > `fcntl()` locks have two very strange behaviours. The first is merely an inconvenience: > `fcntl()` locks are not shared across `fork()`. > > The second strange behaviour of `fcntl()` locks is this: > The lock doesn't belong to a file descriptor, it belongs to a _(pid,inode) pair_. > Worse, if you _close any fd referring to that inode_, it **releases all your locks on that inode.** > Also **beware of `flock()`**, which on some systems is implemented as a call to `fcntl()`. > Note that since `flock()` doesn't work on NFS, and `fcntl()` doesn't work on SMB fs, that there is no locking method that works reliably on all remote filesystems. > Using the logic above, **I settled on `fcntl()` locks.** > ... > **Super Short Summary:** > - **Don't use `flock()`** (_python: `fcntl.flock()`_) because it's not in POSIX and it doesn't work over NFS. > - **Don't use `lockf()`** (_python: does not exist_) because it's not in BSD, and probably just wraps `fcntl()`. > - **Don't use `fcntl()`** (_python: `fcntl.lockf()`_) because it doesn't work over SMB on MacOS, ~~and actually, on MacOS it doesn't work right at all.~~ (_see 2019 update below_) > ... > Are we having fun yet? **I guess _lockfiles_ are the answer after all.** > **Update 2019/01/02:** > - Progress! In 2015, Linux gained a new kind of `fcntl()` lock, `F_OFD_SETLK` ("Open File Description" locks), which work the way you might have expected `fcntl()` locks to work in the first place. > Hopefully other OSes will copy this feature eventually. You can read about it here: https://lwn.net/Articles/640404/. > - Meanwhile, Windows 10 WSL (_Windows Subsystem for Linux_) has come out, and it "implements" `fcntl()` locks by just always returning success, **leading to file corruption everywhere.** > - And, some time after this article was written, MacOS was fixed, so **now `fcntl()` locks seem to work as expected.** Those are the highlights that stood out to me, but there's plenty of info on gotchas the author experienced between C and Python versions of their code across platforms. Things may have improved since then however (_eg: Python 3 and Windows 10 WSL_). The SMB gotcha cited was specific to macOS IIRC. --- So while switching from `fcntl` to `flock` may resolve some issues, you might be trading them for others. It seems you might want to instead stick with `fcntl()` and use `F_OFD_SETLK` like the article update at the end mentions. Where the caveat about compatibility with older systems is probably less of a concern these days and can be clearly stated upfront in docs/README? I don't know how well that works with macOS and Windows though. The [referenced LWN article](https://lwn.net/Articles/586904/) also provided some good insights, and describes the OFD addition as blending the benefits of `fcntl()`(_byte range locks, not whole file locks_) with `flock()` (_lock released when all open references / FDs are closed, not by any single reference alone_). When that's not an option, I guess `flock()` is fine, especially if you only need/expect to perform a whole file lock anyway? --- **EDIT:** The earlier comment [providing a different detailed resource](https://gavv.github.io/articles/file-locks/) on the topic is also quite nice! - [`lockf()` on Linux is as the article mentioned, a wrapper for `fcntl()`](https://gavv.github.io/articles/file-locks/#lockf-function). - [`flock()` works with NFS since kernel 2.6.12 (mid 2005)](https://gavv.github.io/articles/file-locks/#comment-3742149602), where it's emulated as byte ranged with `fcntl(2)`. - [_Open File Descripton_ (fcntl(2)) locks are Linux only](https://gavv.github.io/articles/file-locks/#open-file-description-locks-fcntl), while not currently in the POSIX specs, it is intended to be eventually. Available since kernel 3.15 (mid 2014).