containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.5k stars 2.39k forks source link

volume `NeedsCopyUp` breaking nfs volumes #14722

Closed Ramblurr closed 2 years ago

Ramblurr commented 2 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

Using nfs named volumes has broken recently in some cases. I think this is related to

The error is:

error listing contents of volume my-nfs-vol mountpoint when copying up from container ce4c558481a632701dbb70bafafa279ebb6cbc3bdfe7b94f2ebe3b1e7c98503c: open /var/lib/containers/storage/volumes/my-nfs-vol/_data: permission denied

My current theory is that the copy up routine is being performed as a user that doesn't have permission to write to the nfs share. In my case the only allowed user is 2000 so even uid 0 on the podman host won't have permission.

The funny thing is that I'm not even convinced the pre-condition for NeedsCopyUp is met, because the /var/www/html/data directory in the container image is empty to start with.

I have manually mounted the share using the same mount options, then sued into user w/ uid 2000 and verified the mount works fine.

Using docker-ce (on another fresh host), I've tested it with docker and it works as expected.

Steps to reproduce the issue:

  1. Create a named podman volume using nfs. The exposed share should be owned by a specific uid. In my case it is 2000

    podman volume create --driver local --opt type=nfs --opt device=nas.data.mydomain.com:/mnt/tank2/services/drive.mydomain.com --opt o=addr=nas.data.mydomain.com,vers=3,rw,proto=tcp  my-nfs-vol
  2. Create a container using the mount:

    # as root run:
    podman --log-level=debug run --volume my-nfs-vol:/var/www/html/data  -d --rm --name nc-test --user 2000:2000 --sysctl net.ipv4.ip_unprivileged_port_start=0 docker.io/library/nextcloud:23

Describe the results you received:

..snip...
DEBU[0005] Mounted container "ce4c558481a632701dbb70bafafa279ebb6cbc3bdfe7b94f2ebe3b1e7c98503c" at "/var/lib/containers/storage/overlay/a92da0031413ce3b7c06105a3ba413727692b644f86e2620dc66d148830cc901/merged" 
DEBU[0005] Going to mount named volume 96070d6349a4e557eb0235a2af44c319dcc488936da9aa7972c0747cee65227a 
DEBU[0005] Copying up contents from container ce4c558481a632701dbb70bafafa279ebb6cbc3bdfe7b94f2ebe3b1e7c98503c to volume 96070d6349a4e557eb0235a2af44c319dcc488936da9aa7972c0747cee65227a 
DEBU[0005] Going to mount named volume my-nfs-vol 
DEBU[0005] Volume my-nfs-vol mount count now at 18 
DEBU[0005] Copying up contents from container ce4c558481a632701dbb70bafafa279ebb6cbc3bdfe7b94f2ebe3b1e7c98503c to volume my-nfs-vol 
DEBU[0005] Unmounted cont
...snip...
EBU[0005] ExitCode msg: "error listing contents of volume my-nfs-vol mountpoint when copying up from container ce4c558481a632701dbb70bafafa279ebb6cbc3bdfe7b94f2ebe3b1e7c98503c: open /var/lib/containers/storage/volumes/my-nfs-vol/_data: permission denied" 

Describe the results you expected:

I expect the container to start and work with the mount.

This works fine with docker-ce (on centos 9 stream)

docker volume create --driver local --opt type=nfs --opt device=nas.data.mydomain.com:
/mnt/tank2/services/drive.mydomain.com --opt o=addr=nas.data.mydomain.com,vers=3,rw,proto=tcp --name my-nfs-vol

docker run --volume my-nfs-vol:/var/www/html/data  --rm --name nc-test --user 2000:2000 --sysctl net.ipv4.ip_unprivileged_port_start=0 docker.io/library/nextcloud:23

Additional information you deem important (e.g. issue happens only occasionally):

podman volume inspect my-nfs-vol

[
     {
          "Name": "my-nfs-vol",
          "Driver": "local",
          "Mountpoint": "/var/lib/containers/storage/volumes/my-nfs-vol/_data",
          "CreatedAt": "2022-06-24T09:38:27.261661556Z",
          "Labels": {},
          "Scope": "local",
          "Options": {
               "device": "nas.data.mydomain.com:/mnt/tank2/services/drive.mydomain.com",
               "o": "addr=nas.data.mydomain.com,vers=3,rw,proto=tcp",
               "type": "nfs"
          },
          "MountCount": 18,
          "NeedsCopyUp": true,
          "NeedsChown": true
     }
]

Output of podman version:

Client:       Podman Engine
Version:      4.1.1
API Version:  4.1.1
Go Version:   go1.17.5
Built:        Wed Jun 15 16:59:06 2022
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.26.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.2-2.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.2, commit: 8b8ad6d5fea210d1d098d27339324d33c7a43179'
  cpuUtilization:
    idlePercent: 97
    systemPercent: 1.15
    userPercent: 1.85
  cpus: 11
  distribution:
    distribution: '"centos"'
    version: "9"
  eventLogger: journald
  hostname: container0.mgmt.mydomain.com
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.14.0-78.el9.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 29379575808
  memTotal: 32593039360
  networkBackend: cni
  ociRuntime:
    name: crun
    package: crun-1.4.5-2.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.4.5
      commit: c381048530aa750495cf502ddb7181f2ded5b400
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /bin/slirp4netns
    package: slirp4netns-1.2.0-2.el9.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 0
  swapTotal: 0
  uptime: 2h 28m 30.91s (Approximately 0.08 days)
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 7
    paused: 0
    running: 6
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 53675536384
  graphRootUsed: 8166318080
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 10
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.1.1
  Built: 1655312346
  BuiltTime: Wed Jun 15 16:59:06 2022
  GitCommit: ""
  GoVersion: go1.17.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.1.1

Package info (e.g. output of rpm -q podman or apt list podman):

podman-4.1.1-1.el9.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.): kvm/qemu virtual machine running centos 9 stream

giuseppe commented 2 years ago

does it work for you if we exchange the order of the checks?

diff --git a/libpod/container_internal.go b/libpod/container_internal.go
index ce48987f6..c11164ce3 100644
--- a/libpod/container_internal.go
+++ b/libpod/container_internal.go
@@ -1666,19 +1666,6 @@ func (c *Container) mountNamedVolume(v *ContainerNamedVolume, mountpoint string)
        if vol.state.NeedsCopyUp {
                logrus.Debugf("Copying up contents from container %s to volume %s", c.ID(), vol.Name())

-               // If the volume is not empty, we should not copy up.
-               volMount := vol.mountPoint()
-               contents, err := ioutil.ReadDir(volMount)
-               if err != nil {
-                       return nil, errors.Wrapf(err, "error listing contents of volume %s mountpoint when copying up from container %s", vol.Name(), c.ID())
-               }
-               if len(contents) > 0 {
-                       // The volume is not empty. It was likely modified
-                       // outside of Podman. For safety, let's not copy up into
-                       // it. Fixes CVE-2020-1726.
-                       return vol, nil
-               }
-
                srcDir, err := securejoin.SecureJoin(mountpoint, v.Dest)
                if err != nil {
                        return nil, errors.Wrapf(err, "error calculating destination path to copy up container %s volume %s", c.ID(), vol.Name())
@@ -1712,6 +1699,19 @@ func (c *Container) mountNamedVolume(v *ContainerNamedVolume, mountpoint string)
                        return vol, nil
                }

+               // If the volume is not empty, we should not copy up.
+               volMount := vol.mountPoint()
+               contents, err := ioutil.ReadDir(volMount)
+               if err != nil {
+                       return nil, errors.Wrapf(err, "error listing contents of volume %s mountpoint when copying up from container %s", vol.Name(), c.ID())
+               }
+               if len(contents) > 0 {
+                       // The volume is not empty. It was likely modified
+                       // outside of Podman. For safety, let's not copy up into
+                       // it. Fixes CVE-2020-1726.
+                       return vol, nil
+               }
+
                // Set NeedsCopyUp to false since we are about to do first copy
                // Do not copy second time.
                vol.state.NeedsCopyUp = false
Ramblurr commented 2 years ago

thanks for the quick reply.

I'm not sure exactly. I'm not familiar with the code. But looking at your snippet (and the original file). It seems the error would still occur.

Looking at the code path, the securejoin.SecureJoin(mountpoint, v.Dest) (I'm not sure what that is) error would be triggered first, which would result in an error, or then the same ioutil.ReadDir(volMount) error would occur.

I think its a matter of permissions. Whatever uid context this code is running is doesn't have permission to ioutil.ReadDir(volMount) on the nfs mount location.

giuseppe commented 2 years ago

but srcDir should not be on NFS right?

I thought we would first hit the if len(srcContents) == 0 {return vol, nil} code block so we won't have to deal with the NFS path at all

Ramblurr commented 2 years ago

Oh yes. Sorry, I misread the patch.

Assuming:

If srcDir is checked to be empty and we return before attempting to list volMount, then yes it should fix this issue.

That said, I don't know what the behavior should be when srcDir is not empty yet volMount is an unreadable path. I find the copy up behavior weird. If I mount something at a path, I'd wouldn't expect existing files there to get copied into the mounted path.. just be shadowed or something.

giuseppe commented 2 years ago

I also find this behavior strange but I think that is done for compatibility with Docker.

Maybe we could add a new option for the volume that prevents the copy.

mheon commented 2 years ago

Adding an option to prevent copy-up SGTM. This seems like an uncommon case (a volume explicitly not writable by the user running Podman, only the user in the container).

giuseppe commented 2 years ago

@Ramblurr would you like to open a PR to add a new option to skip the copyup?

Ramblurr commented 2 years ago

I think the idea to switch the order of the checks is the best one.

Afaik docker doesn't have a switch to disable this behavior. Later, I can test to see how docker behaves in the case that the vol mount isn't writable but src dir is non empty.

I'd be more inclined to automatically skip the copy up when it's not possible rather than introduce yet another flag.

giuseppe commented 2 years ago

I'd be more inclined to automatically skip the copy up when it's not possible rather than introduce yet another flag.

how would we distinguish a legit failure from something that should be ignored? I think it must be an explicit request from the user through a new flag.

Ramblurr commented 2 years ago

Ok :) Did some testing with docker.

I followed the similar methodology to the original issue that introduced this behavior to podman: https://github.com/containers/podman/issues/12714

  1. create an nfs volume

    docker volume create --driver local --opt type=nfs --opt device=mynfsserver:/mnt/tank/services/test-podman --opt o=vers=3,rw,proto=tcp,addr=nasip --name test-podman
  2. Mount volume into a container. Volume is mounted into an empty path in the container image. The output of ls shows the single file I created in the nfs share to ensure nfs is working

docker run --volume test-podman:/opt/test  --rm --name nfs-test --user 2000:2000 quay.io/centos/centos:stream9 ls /opt/test
hello-from-nas

This output is expected.

  1. Now, we mount volume into another container. This time the mount point in the container is non-empty
    
    docker run --volume test-podman:/var  --rm --name nfs-test --user 2000:2000 quay.io/centos/centos:stream9 ls /var

docker: Error response from daemon: failed to chmod on /var/lib/docker/volumes/test-podman/_data: chmod /var/lib/docker/volumes/test-podman/_data: operation not permitted. See 'docker run --help'.



---

Based on these results

1. I'd say changing the order of the checks as we discussed earlier is definitely warranted, so that at least nfs mounts work when the `srcDir` is empty.

2. I withdraw my suggestion that we should skip the copy up if there are permissions issues. As you rightly pointed out, we can't distinguish a legit error from one that should be ignored. 

3. I'm ambivalent towards a new flag. I wouldn't use it myself. In all my years of using nfs and docker (and more recently podman), I've never come across a situation where I was relying on the copy up  behavior.

That said, I did some more research, and apparently docker does support a `nocopy` volume mount option: https://docs.docker.com/engine/reference/run/#volume-shared-filesystems

I tested this using the third command above, changing the volume bit to ` --volume test-podman:/var:nocopy` , and it worked fine.

If podman is aiming to keep compatibility with docker, then I suppose this flag could be implemented.
rhatdan commented 2 years ago

SGTM Switching order and adding nocopy flag.

giuseppe commented 2 years ago

PR here: https://github.com/containers/podman/pull/14734