Closed bcat closed 11 months ago
Also, in case the magical chown
ing is just for Docker compatibility, I verified that Docker doesn't have the same issue. The NFS volume works the first (and every other) time:
$ sudo docker volume create -o type=nfs -o o=addr=example.com -o device=:/path share
share
$ sudo docker run --rm -v share:/mnt/share docker.io/library/alpine ls -al /mnt/share
total 5
drwxr-xr-x 2 3000 3000 2 Nov 27 20:18 .
drwxr-xr-x 1 root root 4096 Nov 28 00:38 ..
It occurs to me that since --volume
already accepts option :U
to recursively chown
the volume to the container's user, maybe there could be be option :u
to never chown
the volume. Then there would be three modes to consider and document:
:u
: Leave volume ownership alone.:U
: Recursively chown
files in volume.chown
volume root, but only if it's the first mount.)The "unspecified" behavior could be made Docker compatible (e.g., fixing this issue, and #19652 as well), but folks fully integrated into the Podman ecosystem could use :u
and :U
to get explicit (and arguably more useful) ownership handling. WDYT?
Also, aside, but fixing #19652 without also fixing the network volume issue (or adding some option like :u
to completely disable the chown
behavior) would make this issue much more severe, as every attempt to start a container with an NFS volume would fail, not just the first one. :)
The first time we use a volume we are attempting to chown a file system we are attempting to chown the underlying directory to match the destination, in this case this seems like a bug. We must not be checking if the volume is already set correctly. IE If it is already root, then the we should not care that the chown failed.
We must not be checking if the volume is already set correctly. IE If it is already root, then the we should not care that the chown failed.
In my example:
/mnt/share
, which is not a directory in the container image.)So I think just skipping (or making optional) the chown
if container_owner == volume_owner wouldn't help in this case.
For a more realistic use case, consider the Syncthing container. This container entrypoint starts as user 0, then the entrypoint drops privileges to run the Syncthing binary as an unprivileged user, say, 2998.
On the remote host, the exported NFS share intentionally has owner 2998, not 0. The idea is that the unprivileged user in the container (2998) should be able to write to the NFS share. So it's intended that the network volume owner (2998) and the container's initial user differ (0). No chown
should be attempted even though the two differ.
On Docker, this exact workflow works correctly (Compose file). I am not positive why... maybe Docker doesn't try to chown
network volumes at all?
@rhatdan I dug a bit more into the behavioral differences that seem to cause my test case to work in Docker but fails in Podman.
When mounting into a target directory that already exists in the container image, Docker will by default (unless overridden by the nocopy
mount option) copy contents of the image's target directory into the mount source directory on the host. This operation includes chmod
and chown
.
So in Docker, when you mount an NFS share into a directory that already exists in the image (e.g., /mnt
in the alpine
image), it fails for a similar reason as in Podman:
$ sudo docker volume create -o type=nfs -o o=addr=example.com-o device=:/path share
share
$ sudo docker run --rm -v share:/mnt docker.io/library/alpine ls -al /mnt
docker: Error response from daemon: failed to chmod on /var/lib/docker/volumes/share/_data: chmod /var/lib/docker/volumes/share/_data: operation not permitted.
See 'docker run --help'.
But in Docker, when the mount target path does not exist in the container (e.g., /mnt/share
in the alpine
image), no chmod
or chown
on the source path happens, and the operation succeeds:
$ sudo docker volume create -o type=nfs -o o=addr=example.com-o device=:/path share
share
$ sudo docker run --rm -v share:/mnt/share docker.io/library/alpine ls -al /mnt/share
total 5
drwxr-xr-x 2 3000 3000 2 Nov 27 20:18 .
drwxr-xr-x 1 root root 4096 Nov 28 05:18 ..
Importantly, in Docker, mounting a source directory (including an NFS mount) into a target path that doesn't exist in the container image succeeds, even if the source directory isn't owned by root. If I understand correctly, SafeLchown
would still fail in that case.
Issue Description
I'm trying to convert some self-hosted Docker apps (nothing fancy, just a few services in a single homelab VM) to Podman, as I like the greater flexibility it provides around user namespaces. In the process, I noticed what seems to be a regression of #14766. Since that bug is locked, I figured I'd file a new one.
Steps to reproduce the issue
example.com
, export an NFS share at/path
that's owned by a non-root user (e.g., 3000 in the example output below).$ sudo podman volume create -o type=nfs -o device=example.com:/path share
$ sudo podman run --rm -v share:/mnt/share docker.io/library/alpine ls -al /mnt/share
Describe the results you received
The first time I run a container mounting the NFS volume, I receive the following error and the container fails to start:
Subsequent
podman run
commands using the same volume run successfully and yield the expected output (e.g., listing files in the NFS share in the example above).Describe the results you expected
I expect the container to run and list files in the mounted NFS volume. For the example above, this should look something like the following:
podman info output
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
No
Additional environment details
I'm running Debian stable (12, "bookworm") with Podman packages from testing (13, "trixie"). This gets me Podman 4.7.2. Version 4.8.0 was just released today and isn't pacakged for Debian testing yet, but I don't see anything in the changelog to indicate this behavior has changed.
Side note: Are there plans for an official Podman apt repo like Docker offers? That would be quite handy since Debian releases infrequently, and while it's possible to get newer binaries from
testing
, it seems like it'd be cleaner to have a dedicated repo.Additional information
This bug isn't showstopper since the
NeedsChown
flag on the volume is still cleared after the first failed mount attempt, but I feel like Podman shouldn't be trying tochown
network volumes in the first place. MaybeNeedsChown
should always be false if amount
type is specified at volume creation? (When a volume creates a new directory in the host's filesystem, the initialchown
makes sense, but when the volume just mounts an existing device, it seems unexpected.)