containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.27k stars 2.37k forks source link

tmpfs mount is not always world-writable anymore #20754

Closed xduugu closed 10 months ago

xduugu commented 10 months ago

Issue Description

After my fedora coreos device updated itself yesterday from 38.20231027.3.2 to 39.20231101.3.0, one of my containers failed to start. It is a rootless container with a read-only filesystem which runs as a non-root user. The process requires a work directory, which is why I mount a tmpfs at this location. However, after the upgrade, the work directory was not writable anymore for the user in the container.

I compared the package list between the two releases and the only podman-relevant package was the upgrade of crun: crun-1.10-1.fc38.aarch64 ⟶ 1.11-1.fc39.aarch64

Apparently, crun >= 1.11 uses the permissions of the underlying directory as mode for the tmpfs mount if no mode is specified (https://github.com/containers/crun/commit/3b874c2045a31ee049941947ddbc7114a09fd2c1).

The documentation states:

· tmpfs-mode: File mode of the tmpfs/ramfs in octal. (e.g. 700 or 0700.) Defaults to 1777 in Linux.

I guess this was the case for me before the crun update, but now the statement about the default does not apply anymore.

Possible solutions:

  1. Fix the documentation.
  2. Append mode=1777 to a tmpfs mount, if no other mode is specified.

Steps to reproduce the issue

Steps to reproduce the issue

  1. update to crun >= 1.11
  2. run podman run --userns=keep-id --tmpfs /home docker.io/library/alpine:latest touch /home/test
  3. get error message:

    touch: /home/test: Permission denied

Describe the results you received

The tmpfs mount is not writable for the non-root user.

Describe the results you expected

The tmpfs mount should be writable for the non-root user.

podman info output

host:
  arch: arm64
  buildahVersion: 1.32.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-3.fc39.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 99.79
    systemPercent: 0.09
    userPercent: 0.12
  cpus: 4
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: coreos
    version: "39"
  eventLogger: file
  freeLocks: 2044
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 589824
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 589824
      size: 65536
  kernel: 6.5.9-300.fc39.aarch64
  linkmode: dynamic
  logDriver: none
  memFree: 6860488704
  memTotal: 8204500992
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.8.0-1.fc39.aarch64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.8.0
    package: netavark-1.8.0-2.fc39.aarch64
    path: /usr/libexec/podman/netavark
    version: netavark 1.8.0
  ociRuntime:
    name: crun
    package: crun-1.11-1.fc39.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.11
      commit: 11f8d3dc9fc4bb8a0adcff5ba8bd340f24612701
      rundir: /run/user/1001/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231004.gf851084-1.fc39.aarch64
    version: |
      pasta 0^20231004.gf851084-1.fc39.aarch64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1001/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-1.fc39.aarch64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 0
  swapTotal: 0
  uptime: 36h 17m 18.00s (Approximately 1.50 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /var/home/podman/.config/containers/storage.conf
  containerStore:
    number: 4
    paused: 0
    running: 4
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.12-2.fc39.aarch64
      Version: |-
        fusermount3 version: 3.16.1
        fuse-overlayfs: version 1.12
        FUSE library version 3.16.1
        using FUSE kernel interface version 7.38
  graphRoot: /var/home/podman/.local/share/containers/storage
  graphRootAllocated: 127438663680
  graphRootUsed: 5462306816
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 8
  runRoot: /run/user/1001/containers
  transientStore: true
  volumePath: /var/home/podman/.local/share/containers/storage/volumes
version:
  APIVersion: 4.7.0
  Built: 1695838660
  BuiltTime: Wed Sep 27 18:17:40 2023
  GitCommit: ""
  GoVersion: go1.21.1
  Os: linux
  OsArch: linux/arm64
  Version: 4.7.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

Luap99 commented 10 months ago

@giuseppe I guess the crun change was intentional? If so we should update the docs.

xduugu commented 10 months ago

@Luap99 I think it was intentional. At least in the NEWS file, he points out

  • linux: append tmpfs mode if missing for mounts. This is the same behavior of runc.

I am undecided what to do here. On one hand, it sounds right to respect the underlying directory permissions, but on the other hand, I don't see any use-case when you mount a tmpfs into a container and don't want it to be writable at least by the container user. However, I don't have an readhat account, so I cannot check what the issue was that was fixed by this commit.

I just think it's a bit unfortunate that the behavior depends on the used container runtime engine. And what about docker compatibility? Iirc, I never had to adjust the permissions of a mounted tmpfs to be writable by the container user.

By the way, I fixed my issue by using the U mount option (chown unfortunately does not work for --tmpfs). Maybe I should always use U,mode=0700 as mount options for a tmpfs mount to be independent of the actual directory permissions in the image, because U is not enough if the underlying directory is not writable in the image:

podman run --userns=keep-id --tmpfs /var/empty:U docker.io/library/alpine:latest touch /var/empty/test
giuseppe commented 10 months ago

I also don't get the rationale behind the runc feature. Still, an issue was reported where a tmpfs directory created with crun has mode 0777 while runc honors the underlying directory so in the end I've decided to follow what runc does so that it doesn't look less safe with crun.

The documentation seems wrong anyway, since that is not the case with runc.

Luap99 commented 10 months ago

So the correct docs would be Defaults to the permissions of the directory in the underlying image, if it does not exists it uses 1777 as mode?

giuseppe commented 10 months ago

that is what runc/crun do, but do we need to capture it in the Podman documentation? It might be different with a different runtime, so if we want to specify it, we need to say this is what happens with runc/crun, but other runtimes can use a different default

xduugu commented 10 months ago

I just checked the docker documentation:

| tmpfs-mode | File mode of the tmpfs in octal. For instance, 700 or 0770. Defaults to 1777 or world-writable. |

Does podman need to be compatible to docker in this respect?

rhatdan commented 10 months ago

I think the correct behaviour is to match the underlying directory. And then allow the users to override it.

rhatdan commented 10 months ago

Docker seems to perform the same way.

sh-5.2# docker run alpine ls -ld /mnt
drwxr-xr-x    2 root     root          4096 Sep 28 11:18 /mnt
sh-5.2# docker run --tmpfs /mnt alpine ls -ld /mnt
drwxr-xr-x    2 root     root            40 Nov 25 17:02 /mnt
xduugu commented 10 months ago

@rhatdan You already closed the issue. Doesn't the documentation need an update? Maybe removing the part Defaults to 1777 in Linux. is enough to indicate that podman does not guarantee a specific directory mode.

rhatdan commented 10 months ago

Interested in opening a PR?

giuseppe commented 10 months ago

opened a PR: https://github.com/containers/podman/pull/20807

xduugu commented 10 months ago

Thanks! @giuseppe