Open robryk opened 4 years ago
Bind mounting over symlinks attempts to create their targets and bind mount over them
Sorry, this is how mount(2)
works. We cannot mount anything on a symlink: mount(2)
follows symlinks, whether we want it to or not.
It's still confusing to actually do that then, instead of failing. I would wager that this is ~never the desired behaviour. I would prefer if (see the last paragraphs of the original report):
Bind mounting over symlinks attempts to create their targets and bind mount over them
Sorry, this is how
mount(2)
works. We cannot mount anything on a symlink:mount(2)
follows symlinks, whether we want it to or not.
My mount (2)
manpage says:
MS_NOSYMFOLLOW (since Linux 5.10)
Do not follow symbolic links when resolving paths. Symbolic links can still be created, and readlink(1), readlink(2), realpath(1), and realpath(3) all still work properly.
Is this the feature that could make it work?
Is this the feature that could make it work?
It depends exactly how it works. Perhaps you could try it and see what happens?
Hm. I can't see how the behavior changes, adding this option or leaving it out. I guess it won't help here.
Note that MS_NOSYMFOLLOW
seems to be an option that affects the created mount (i.e. symlinks that are in that filesystem won't be followed) and not how mount()
resolves its target path.
Can the new mounting API be leveraged for this?
https://brauner.io/2023/02/28/mounting-into-mount-namespaces.html
It looks like the new syscall open_tree
accepts the AT_SYMLINK_NOFOLLOW
:
The move_mount
also has the symlink following flag:
Can the new mounting API be leveraged for this?
I don't know, can it? Or would the syscall just fail with ELOOP
if the target is a symlink? (Those are the two reasonable interpretations of AT_SYMLINK_NOFOLLOW
.)
If the answer is that a new mounting API can mount over the top of an existing symlink without dereferencing the symlink, I'd be happy to review a pull request. It would still need to be able to fall back to the old mounting API to support older kernels.
I'd be happy to review a pull request. It would still need to be able to fall back to the old mounting API to support older kernels.
I will try to experiment with the new API. One issue is that it is barely documented. I guess I will read to the kernel source code to figure it out.
Actually the mount
from util-linux can already use new mount API and has an option to switch between them.
https://github.com/util-linux/util-linux/commit/fd6b4d94ff013dc7ed680d4e864610da5b9751f1
So lets put it to test:
1). Create a non-existent symlink:
>>> ln -s /nonexistent ./test
2). Unshare the mount namespace:
>>> unshare --user --map-root-user --mount
3). Check that it is still symlink.
>>> /usr/bin/ls -l ./
total 0
lrwxrwxrwx 1 igo95862 igo95862 12 Dec 5 23:10 test -> /nonexistent
4). Do a mount over a symlink forcing new api. Mount some existing file.
env LIBMOUNT_FORCE_MOUNT2=never mount --bind /etc/resolv.conf ./test
5). Check what happens.
>>> /usr/bin/ls -l ./
total 4
-rw-r--r-- 1 nobody nobody 73 Dec 3 17:31 test
>>> cat ./test
nameserver 127.0.0.1
nameserver ::1
So it does become a regular file that we mounted on top.
Lets try to use the old mout API:
>>> env LIBMOUNT_FORCE_MOUNT2=always mount --bind /etc/resolv.conf ./test
mount: ./test: mount point is a symbolic link to nowhere.
dmesg(1) may have more information after failed mount system call.
So the new API definitely fixes this issue.
Some strace logs.
Using new API:
open_tree(AT_FDCWD, "/etc/resolv.conf", OPEN_TREE_CLONE|OPEN_TREE_CLOEXEC) = 3
move_mount(3, "", AT_FDCWD, "./test", MOVE_MOUNT_F_EMPTY_PATH) = 0
Using old API:
mount("/etc/resolv.conf", "./test", 0x55e0e3605670, MS_BIND, NULL) = -1 ENOENT (No such file or directory)
One issue is that it is barely documented.
You can grab the manpage drafts from the kernel mailing list to get a better starting point.
You can grab the manpage drafts from the kernel mailing list to get a better starting point.
Could you please give a link. I am not good with mailing lists.
https://lore.kernel.org/all/15449.1531263162@warthog.procyon.org.uk/
https://duckduckgo.com/?q=site%3Akernel.org+%22Add+manpage+for%22+move_mount&ia=web
Thank you.
Weird that those man pages patches were around since 2018 but never merged.
I've rendered the man pages to human readable format:
Looking at man pages it seems the new API has a few more advantages. For example, it has option of not triggering auto-mount.
If you look at this part of man page looks like new mount API can mount over mount point that does not exist.
FWIW: https://github.com/sunfishcode/linux-mount-api-documentation (@igo95862)
Actually you can do this with the old mount api as well using /proc/self/fd
.
diff --git a/bind-mount.c b/bind-mount.c
index 57b4236..d034c51 100644
--- a/bind-mount.c
+++ b/bind-mount.c
@@ -395,7 +395,9 @@ bind_mount (int proc_fd,
if (src)
{
- if (mount (src, dest, NULL, MS_SILENT | MS_BIND | (recursive ? MS_REC : 0), NULL) != 0)
+ dest_fd = openat(AT_FDCWD, dest, O_CLOEXEC | O_PATH | O_NOFOLLOW);
+ dest_proc = get_oldroot_path (xasprintf ("/proc/self/fd/%d", dest_fd));
+ if (mount (src, dest_proc, NULL, MS_SILENT | MS_BIND | (recursive ? MS_REC : 0), NULL) != 0)
return BIND_MOUNT_ERROR_MOUNT;
}
$ ln -s ENOENT hello1.txt
$ echo hello > hello2.txt
$ stat -c '%F' hello1.txt
symbolic link
$ cat hello1.txt
cat: hello1.txt: No such file or directory
$ ./_builddir/bwrap --dev-bind / / --bind $PWD/hello2.txt $PWD/hello1.txt stat -c '%F' hello1.txt
regular file
$ ./_builddir/bwrap --dev-bind / / --bind $PWD/hello2.txt $PWD/hello1.txt cat hello1.txt
hello
[!CAUTION] The PoC diff from above does not
- handle errors (return code of openat, ...).
- cares about cleanup (free, close).
- cares about preserving other flags done below via mountinfo.
- stop bwrap from creating the target file.
What I observe:
It seems that:
bwrap
to mount over the symlink's target.bwrap
to fail.bwrap
to create the symlink target and mount over it.What I expect:
Mounting over any kind of symlink to cause the resulting mount namespace to have the symlink replaced with the source of the mount. This did not happen in any of the four cases.
If doing that is impossible (quick skim of
man mount(2)
didn't yield an equivalent ofO_NOFOLLOW
, even thoughumount2
does have theUMOUNT_NOFOLLOW
flag), I expect bwrap to exit with an error. I also expect it not to mount anything over the symlink's target. This failed to happen in cases 1 and 3.It would be nice if the "File not found" error was more descriptive. It's not obvious which file is "missing" (my first intuition was that the problem was with one of the directories in the target path).