Open matejvasek opened 1 year ago
I see the ownership is kept if you delete the volume before running the second command.
Does docker automatically delete the volume when the first container exits?
@giuseppe no the volume persists.
Another way to reproduce: try building an app using pack CLI with podman and untrusted builder.
@giuseppe but you might be onto something: the ownership behaves differently the moment I try to write something into the volume.
Maybe I isolated the bug in wrong way, but there's definitely some issues with volume mounting. The pack CLI does application build in multiple containers that share some data via volumes. With Docker it works with podman it fails because of ownership issues.
pack build my-go-app -Bghcr.io/knative/builder-jammy-full:latest --docker-host=inherit --trust-builder=0
.@giuseppe try running:
#!/bin/sh
set -e
cat <<EOF > Dockerfile.usera
FROM alpine
USER root
RUN mkdir -p /workspace
RUN chown 1001:1002 /workspace
USER 1001:1002
EOF
cat <<EOF > Dockerfile.userb
FROM alpine
USER root
RUN mkdir -p /workspace
RUN chown 1003:1004 /workspace
USER 1003:1004
EOF
docker build -q . -f Dockerfile.usera -t alpine-usera
docker build -q . -f Dockerfile.userb -t alpine-userb
docker volume rm test-volume || true
docker run --rm -v test-volume:/workspace alpine-usera sh -c 'echo done'
docker run --rm -v test-volume:/workspace alpine-userb sh -c 'touch /workspace/b'
docker volume rm test-volume || true
With docker it works but on podman it fails.
@giuseppe note that if the first container actually tried to write to /workspace/
it would fail with Moby too. But in our usecase the first container uses the volume as read only. Although it may not declare it via :ro
@mheon do we need to change ownership every time we use the volume in a container?
I have to assume we added that code for a reason, but I can't recall exactly why. Almost certainly a bugfix, but exactly what was being fixed is unclear. The exact on-mount behavior for volumes versus Docker has been a persistent problem.
fyi in the past even the very first mounting container had bad ownership, see https://github.com/containers/podman/pull/10905
I actually cannot find an explicit chown of the volume mountpoint anywhere in the mount code. So I'm actually not 100% on where this is being done; it may be an unintentional side-effect of another chown doing something else?
Looks like the chown is called only when volume if brand new -- created together with a new container.
wrt:
docker run --rm -v test-volume:/workspace alpine-usera sh -c 'echo done'
docker run --rm -v test-volume:/workspace alpine-userb sh -c 'touch /workspace/b'
It appears that chow is called only for the first container.
there is some state vol.state.NeedsChown
I assume this ensures that chown is done once?
The vol.state.NeedsChown
seems to be set on the first chown done by the first container, so subsequent containers won't chown it.
@giuseppe how important is vol.state.NeedsChown
?
Ah, ok, per @matejvasek it's fixVolumePermissions()
Looking further, it's tied to a bool, NeedsChown
, in the volume config. Set to true at volume create, false once permissions have been fixed during first mount into a container. Dropping the bool entirely and making the chown unconditional ought to fix this?
@mheon I believe it will fix the issue, but I don't know if it could have any adverse effects.
It is not doing a recursive chown, correct? I think the goal there was to make sure the volume is owned by the primart user of the container. I think I had a PR on this code at one point to attempt to change it, but I gave up. https://github.com/containers/podman/pull/16782
make sure the volume is owned by the primart user of the container.
Small correction: primary user uid/gid is used only if the mount point does not already exist in the container. If the mount point exist (as directory) then uid/gid of the directory shall be used.
Setting ownership just once makes sense if you assume that the volume will be always used just by single container.
However that's not my case the pack
CLI runs multiple containers in sequence on one volume.
If there's a reason we added this originally, it's Docker compat. If Docker doesn't do the same thing, they that reason is not valid.
If there's a reason we added this originally, it's Docker compat. If Docker doesn't do the same thing, they that reason is not valid.
What you mean by this here? The fact that we do chown
, or the fact that we do it only once?
Only once. There's no reason we'd add such complexity other than to match Docker
Looks like NeedsChown
was introduced in https://github.com/containers/podman/pull/6747 which was fixing https://github.com/containers/podman/issues/5698.
The issue does not directly mention Docker.
Also --userns
that was supposed to be fixed does not even exist on Docker. So I believe that NeedsChown
has nothing to do with Docker compatibility.
@giuseppe do you recall why NeedsChown was needed?
I don't remember more than what is in the original issue.
Also, how does it work on Docker if multiple containers use the same volume? It is chown'ed to the last user?
Also, how does it work on Docker if multiple containers use the same volume? It is chown'ed to the last user?
Yes, it appears so. It is chowned to the uid/gid of the mountpoint for each container.
See the test in bug description. The second container can create a file in mountpoint directory. It works on Docker, but not on podman.
but what happens if you keep the first container running?
Does the second container override the first chown?
It runs in series. Not in parallel.
My use case is sequential.
Does the second container override the first chown?
But yes, that's what appears to happen.
sure but if we are going to change the current behavior to match docker compatibility we need to address all the cases, not just one.
What I am arguing is that the docker mechanism of always chowning, is subject to race conditions:
# (sleep 0.5; docker run --rm -v test-volume:/workspace alpine-userb true) & docker run --rm -v test-volume:/workspace alpine-usera sh -c 'stat -c %u:%g /workspace ; sleep 1; stat -c %u:%g /workspace'
1001:1002
1003:1004
The ownership of a volume can change while a container is using it.
The Podman behavior makes more sense, we can probably just document it and not aim for compatibility
is subject to race conditions:
Well if somebody runs two containers at same time on the same volume that they had it coming. Also even in current form this is a problem: only one of the containers "wins" and forces it's ownership.
@giuseppe I'll try to modify pack
CLI to work even with podman behaviour. Will see if it's possible.
The thing is that pack
CLI first run root owned container on the volume. All subsequent containers then cannot create files in the volume as a consequence.
I think that we should probably change to match Docker here - IE, unconditional chown on mount into a container, even if the volume is in use by another container. It's racy, but that's sort of expected - using different UID/GID containers with the same volume is always going to be questionable
If we do it then we should add some option to persevere the current behaviour. I can easily see this as big performance hit if we start to chown on each container start, consider volumes with many files.
If we do it then we should add some option to persevere the current behaviour. I can easily see this as big performance hit if we start to chown on each container start, consider volumes with many files.
In't the chown non recursive?
I believe in the past it was recursive but now it is not, right?
Ah you are right the normal volume is not recursive. Then this should not be a problem, although it seems very weird to not chown recursively. If the first container created files in the volume we just chown the parent directory how is the second container supposed to read the files?
I don't think that's expected to work. We want the new container to at least be able to write to the mounted volume. If the goal is multiple containers which use different UIDs and GIDs to be able to read from the same volume, I don't know if that's possible right now, nor do I think it's necessarily expected.
It should not recursively chown. Although we do have :U which I think recursively chown.
A friendly reminder that this issue had no activity for 30 days.
Issue Description
When mounting a volume into a container the mountpoint directory should preserve it's ownership. This seems to work only for very first run/mount. Subsequent mounts have altered ownership of mountpoints directory (to the ownership set by first mounter).
This happens at least
podman
v4.3
--v4.6
.Steps to reproduce the issue
Run following script against podman docker compat socket:
Describe the results you received
The script exits with non-zero exit code and error message.
Describe the results you expected
The script exits with 0 exit code.
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
I tested this on rootless but I believe the same thing happens for privileged too.
Additional information
Happens always.