docker / for-mac

Bug reports for Docker Desktop for Mac
https://www.docker.com/products/docker#/mac
2.44k stars 118 forks source link

gRPC-FUSE reports inaccurate executable permissions #5509

Open myw opened 3 years ago

myw commented 3 years ago

Summary

gRPC-FUSE volumes seem to be incorrectly reporting some permissions. Namely, python2.7 seems to think non-executable files are executable, when mounted via gRPC-FUSE volumes. I present a minimal test case below.

Expected behavior

A host directory is mounted inside my container with a bind mount. When I test the access of non-executable file in that directory with Python, I expect it to tell me that it is non-executable. i.e. if stat tells me file foo has mode 0644, os.access('foo', os.X_OK) should return False.

When I try this with gRPC-FUSE turned off, this is what happens.

Actual behavior

When gRPC-FUSE is enabled, os.access('foo', os.X_OK) returns True, even though the file has mode 0644.

Information

This is quite reproducible.

A minimal test case to highlight the issue is described below. For the sake of brevity, I am not posting further detailed examples, bit I have also verified that the erroneous behavior happens with a non-root user in the container. I have not tested with python3 or with other python2 base images.

Steps to reproduce the behavior

0. Control: Show that in a container without a bind mount, python correctly identifies a file with mode 0644 as non-executable.

bash-3.2$ docker run --rm python:2.7-slim-buster bash -c "
cd /tmp
rm -f testfile.tmp
touch testfile.tmp
stat --format='%a' testfile.tmp
python <<EOF
import os
print oct(os.stat('testfile.tmp').st_mode)
print os.access('testfile.tmp', os.X_OK)
EOF
"
644
0100644
False

Because the result of the Python expression is False, python correctly identifies that it does not have execute permissions on the file. This is the expected behavior and is true regardless of whether or not Use gRPC FUSE for file sharing is enabled, because the file is not on a bind mount.

1. Expected Behavior With Use gRPC FUSE for file sharing DISABLED, run the same code as above, but have the file be on a bind mount. Note the addition of --volume="$(pwd):/tmp" is the only change to the command.

bash-3.2$ docker run --rm --volume="$(pwd):/tmp" python:2.7-slim-buster bash -c "
cd /tmp
rm -f testfile.tmp
touch testfile.tmp
stat --format='%a' testfile.tmp
python <<EOF
import os
print oct(os.stat('testfile.tmp').st_mode)
print os.access('testfile.tmp', os.X_OK)
EOF
"
644
0100644
False

The result is the same as the control: the expected behavior.

2. Actual Behavior Now, ENABLE Use gRPC FUSE for file sharing, and run the exact same code as in 1. above:

bash-3.2$ docker run --rm --volume="$(pwd):/tmp" python:2.7-slim-buster bash -c "
cd /tmp
rm -f testfile.tmp
touch testfile.tmp
stat --format='%a' testfile.tmp
python <<EOF
import os
print oct(os.stat('testfile.tmp').st_mode)
print os.access('testfile.tmp', os.X_OK)
EOF
"
644
0100644
True

Now, even though Python correctly sees the mode of the file, os.access incorrectly returns True. One consequence of this behavior is that nosetests ignores all files by default because it thinks they are executable.

Happy to provide additional information to help debug.

Thanks!

myw commented 3 years ago

Just made an even simpler reproducible test case entirely in bash:

bash-3.2$ docker run --rm --volume="$(pwd):/tmp" python:3 bash -c "
          cd /tmp
          rm -f testfile.tmp
          touch testfile.tmp
          stat --format='%a' testfile.tmp
 [ -x testfile.tmp ] && echo 'access'"
644
access
myw commented 3 years ago

Further testing: Some base images do not exhibit this behavior: alpine, busybox, and cirros, when running the equivalent test in sh, exhibit correct behavior:

docker run --rm --volume="$(pwd):/tmp" cirros sh -c "
cd /tmp
rm -f testfile.tmp
touch testfile.tmp
stat -c '%a' testfile.tmp
[ -x testfile.tmp ] && echo 'access'"
644

Interestingly, python:alpine allows us to test both python and sh on the same OS. When doing that, the [ -x ] test in sh works correctly, but the os.access test in python fails.

Likely, the version of sh on the alpine, cirrus, and busybox distros works as expected with gRPC-FUSE, but Python and/or bash/sh on other systems do not.

myw commented 3 years ago

More testing: opensuse/leap with sh: fails. opensuse/tumbleweed with sh: passes.

normanmaurer commented 3 years ago

I see exactly the same problem when I try to compile netty :/

myw commented 3 years ago

Seems that the distributions that do not fail this test mostly use busybox, whose shell's access test function specifically mentions not "mak[ing] the mistake of telling root that any file is executable."

This makes me think that gRPC-FUSE is doing something where the access to file is being tested as root, which triggers a common edge-case behavior in the standard POSIX access system call. This logic does seem to have been resolved in osxfs, so there's probably a workable fix.

normanmaurer commented 3 years ago

It saw this failing on centos...

myw commented 3 years ago

Still exists as of 3.3.1. Note that this does not depend on the user inside the docker container being root.

docker run -u nobody --rm --volume="$(pwd):/tmp" debian sh -c "
cd /tmp
rm -f testfile.tmp
touch testfile.tmp
stat -c '%a' testfile.tmp
[ -x testfile.tmp ] && echo 'access'"
644
access
thaJeztah commented 3 years ago

/cc @djs55

docker-robott commented 3 years ago

Issues go stale after 90 days of inactivity. Mark the issue as fresh with /remove-lifecycle stale comment. Stale issues will be closed after an additional 30 days of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale

myw commented 3 years ago

/remove-lifecycle stale

On Mon, Jul 26, 2021 at 9:00 PM docker-desktop-robot < @.***> wrote:

Issues go stale after 90 days of inactivity. Mark the issue as fresh with /remove-lifecycle stale comment. Stale issues will be closed after an additional 30 days of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or

docker-for-windows.

/lifecycle stale

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/docker/for-mac/issues/5509#issuecomment-887129244, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABJNQUVACXS5ZR5JBI7UZDTZYAKJANCNFSM4Z2PVIXA .

-- Misha

chrisvest commented 3 years ago

Noticed this as well on macOS Big Sur (not using the new virt. framework), Docker Desktop Version 4.0.0 (4.0.0.12) with CentOS 6.10 image.

djs55 commented 3 years ago

I think this is the same as #5029 and has the same root cause as https://github.com/docker/for-mac/issues/5944#issuecomment-912450810 . Quoting from there:

We use Linux FUSE to mount the host filesystem. There are 2 permission models: https://elixir.bootlin.com/linux/latest/source/fs/fuse/dir.c#L1197 . We delegate the permission checks to the server (by not setting default_permissions) because we want to avoid the situation where a user has a writable file on the host but can't get Linux to write to it because Linux believes the file owner/group is different. In this mode access(path, W_OK) will invoke the fuse_access API https://elixir.bootlin.com/linux/latest/source/fs/fuse/dir.c#L1160 .

However we also care about performance. Many filesystem options performed by Linux will be prefixed by a fuse_access call, which doubles the numbers of RPCs to the host in these workloads. These access results are really just hints, as the real access check has to be done within the write / open / rm call, since the permissions can change between the calls. Therefore we also disable FUSE_ACCESS on the server side by returning ENOSYS so no_access is set here https://elixir.bootlin.com/linux/latest/source/fs/fuse/dir.c#L1168 . The result is that Linux assumes access returns success, then (hopefully) tries the real operation to see whether it succeeds or not. The host does the access control against the real file permissions.

So it's a combination of

results in the inaccuracy of the access call. Fixing this is possible, but it would reduce performance.

myw commented 3 years ago

@djs55 Fascinating and incredibly helpful context. Thank you!

I am not familiar with the filesystem management on that level, but your proposal for the root cause makes sense to me—I haven't observed any behavior that would contradict it.

I do think that this behavior is far enough outside the bounds of expectation that it should be possible to disable it without disabling all of gRPC-FUSE (maybe with a config-file-only setting?), which would provide most of the existing benefits and presumably still offer some performance benefit, even with the extra access calls. But whether or not it's worth it to work on a fix like that would depend on the performance impact tradeoff.

Conversely, would it somehow be possible to disable the fuse_access call only when we know it's coming as an access hint from the Linux filesystem that's about to be immediately followed by a write/open/rm call? That is, if we know it's something like python code making the call explicitly from userspace, rather than the filesystem itself checking, could we let the call go through? I doubt the answer is yes, but I think it would effectively resolve the issue with less performance impact.

Finally, for the sake of any others others following this this post, I do also want to share the two workarounds you mentioned in the rest of that comment that do not involve turning off gRPC-FUSE, which might be helpful in some use-cases:

… if you would like 100% native Linux access control checks, you can store your data in a "named volume" which resides inside the Linux filesystem. For example:

docker volume create my-code
docker run -v my-code:/mnt alpine ls /mnt

Another possibility is to use "dev environments" https://docs.docker.com/desktop/dev-environments/ which store the code in Linux (so 100% native filesystem semantics) while also allowing you to seamlessly access everything from your IDE (as well as push/pull the environment to share it with colleagues etc)

In addition to these workarounds, I'm wondering if there's any chance using the new BigSur virtualization framework could either resolve the issue, or otherwise improve performance to mitigate the impact of fixing it?

Thanks again for looking into this.

djs55 commented 3 years ago

@myw thanks for the quick reply! For what it's worth, I'm not satisfied with the current state either. There are some improvements coming in macOS Monterey in the virtualization.framework which may help speed things up and improve the semantics of access: we're investigating those. We'll let you know if/when we have something interesting to try.

GKTheOne commented 3 years ago

I came here after finding #5007.

This impacts the database initialisation scripts I want to use with postgres (& mysql) images. I have some scripts that I want sourced by the initialisation instead of executed. However, because of this issue, the entrypoint script tries to execute the scripts (because -x file test succeeds) which fails (with permission denied) because the execute bit is not actually set.

docker-robott commented 2 years ago

Issues go stale after 90 days of inactivity. Mark the issue as fresh with /remove-lifecycle stale comment. Stale issues will be closed after an additional 30 days of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale

myw commented 2 years ago

/remove-lifecycle stale

thaJeztah commented 2 years ago

/lifecycle frozen

razzed commented 3 weeks ago

This still exists in 4.34.3 (170107) Engine 27.2.0 on Mac OS X 15.0.1 (24A348)

Surprised this bug has been in existence for so long - [ -x plainFile ] on a plain, non executable file on volume mounted inside a container returns true (exit 0).

The difference between the alpine and other containers is that alpine containers actually copy the volume so it is not shared and therefore behaves correctly. You can detect this by simply adding a file inside the container and seeing if the local copy changes. However, ubuntu container can replicate this bug easily. Simple case October 2024:

mkdir test
touch test/notx.md
docker run -v "$(pwd)/test:/root/test" -it ubuntu:latest
root@5948a905ab89:/# cd /root/test
root@5948a905ab89:~/test# ls -la
total 4
drwxr-xr-x 3 root root   96 Oct 23 14:10 .
drwx------ 1 root root 4096 Oct 23 14:11 ..
-rw-r--r-- 1 root root    0 Oct 23 14:10 notx.md
root@5948a905ab89:~/test# if [ -x notx.md ]; then echo "is executable"; else echo "works correctly"; fi
is executable
root@5948a905ab89:~/test#

Is there any reason this can't be fixed? Seems like a pretty major issue.

You can workaround this by simply not using mounted volumes and copy your files into the target container but this means you lose development speed.