containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.22k stars 2.37k forks source link

Rootless container in a rootless container does not own `/sys/fs/cgroup` #21381

Closed adelton closed 3 months ago

adelton commented 8 months ago

Issue Description

I try to fully grasp all the possible combinations of cgroups behaviour that can happen with podman, so I run possibly strange combination of tests.

When I have a privileged rootless podman container and I run a rootless podman container it in, the /sys/fs/cgroup is mounted rw there but owned by nobody (meaning like the root in the parent containre), leading to Permission denied.

Steps to reproduce the issue

Steps to reproduce the issue

  1. With podman-4.8.3-1.fc39.x86_64, run a privileged rootless quay.io/podman/stable container:
    host$ podman run --rm -ti --privileged -h container quay.io/podman/stable
    [root@container /]#
  2. Check that we run in user namespace and what the cgroups situation is:
    [root@container /]# cat /proc/self/uid_map
         0       1000          1
         1     100000      65536
    [root@container /]# mount | grep cgroup
    cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot)
    [root@container /]# mkdir /sys/fs/cgroup/test-123 && ls -dl /sys/fs/cgroup/test-123
    drwxr-xr-x. 2 root root 0 Jan 26 11:35 /sys/fs/cgroup/test-123
  3. To allow mounting a fresh sysfs (see https://github.com/containers/podman/issues/21376#issuecomment-1911836469), remove the default configuration that is in that quay.io/podman/stable image:
    [root@container /]# sha256sum /etc/containers/containers.conf 
    438bbcce18630f93ba7c3336cf46d95156ad38b2a63cea8db3ed865b398709c4  /etc/containers/containers.conf
    [root@container /]# rm -f /etc/containers/containers.conf
  4. As the podman user in that container, run a privileged container in that privileged container:
    [root@container /]# runuser -u podman -- podman run --rm -ti --privileged -h nested registry.fedoraproject.org/fedora
    Trying to pull registry.fedoraproject.org/fedora:latest...
    Getting image source signatures
    Copying blob 718a00fe3212 done   | 
    Copying config 368a084ba1 done   | 
    Writing manifest to image destination
    WARN[0009] Path "/run/secrets/etc-pki-entitlement" from "/etc/containers/mounts.conf" doesn't exist, skipping 
    WARN[0009] Path "/run/secrets/rhsm" from "/etc/containers/mounts.conf" doesn't exist, skipping 
  5. Check that we run in another (nested) user namespace:
    [root@nested /]# cat /proc/self/uid_map
         0       1000          1
         1          1        999
      1000       1001      64535
  6. Check what cgroups we have:
    [root@nested /]# mount | grep cgroup
    cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot)
    [root@nested /]# ls -ld /sys/fs/cgroup
    drwxr-xr-x. 3 nobody nobody 0 Jan 26 11:35 /sys/fs/cgroup
    [root@nested /]# cat /proc/self/cgroup 
    0::/
    [root@nested /]# id
    uid=0(root) gid=0(root) groups=0(root)
  7. Try to create a new cgroup:
    [root@nested /]# mkdir /sys/fs/cgroup/test-456

Describe the results you received

[root@nested /]# mkdir /sys/fs/cgroup/test-456
mkdir: cannot create directory ‘/sys/fs/cgroup/test-456’: Permission denied

Describe the results you expected

No error.

I wonder if this is podman equivalent of CRI-O's https://github.com/cri-o/cri-o/issues/7623?

podman info output

[root@container /]# podman info
host:
  arch: amd64
  buildahVersion: 1.33.2
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.8-2.fc39.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: '
  cpuUtilization:
    idlePercent: 60.14
    systemPercent: 5.42
    userPercent: 34.44
  cpus: 8
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: container
    version: "39"
  eventLogger: file
  freeLocks: 2048
  hostname: container
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 6.6.11-200.fc39.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 11059781632
  memTotal: 33343328256
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.9.0-1.fc39.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.9.0
    package: netavark-1.9.0-1.fc39.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.9.0
  ociRuntime:
    name: crun
    package: crun-1.12-1.fc39.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.12
      commit: ce429cb2e277d001c2179df1ac66a470f00802ae
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231230.gf091893-1.fc39.x86_64
    version: |
      pasta 0^20231230.gf091893-1.fc39.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-1.fc39.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 7192797184
  swapTotal: 8589930496
  uptime: 144h 59m 24.00s (Approximately 6.00 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /var/lib/shared
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.12-2.fc39.x86_64
      Version: |-
        fusermount3 version: 3.16.1
        fuse-overlayfs: version 1.12
        FUSE library version 3.16.1
        using FUSE kernel interface version 7.38
    overlay.mountopt: nodev,fsync=0
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 57940439040
  graphRootUsed: 34632925184
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.8.3
  Built: 1704291100
  BuiltTime: Wed Jan  3 14:11:40 2024
  GitCommit: ""
  GoVersion: go1.21.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.8.3

Podman in a container

Yes

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

None.

Additional information

This is deterministic.

github-actions[bot] commented 7 months ago

A friendly reminder that this issue had no activity for 30 days.

Luap99 commented 3 months ago

@giuseppe Ideas?

giuseppe commented 3 months ago

you need to configure the cgroup for the nested container by yourself, since there is not systemd inside the outer container to do it for us:

$ podman run  --rm -ti --privileged -h container quay.io/podman/stable
# mkdir /sys/fs/cgroup/init
# echo 1 > /sys/fs/cgroup/init/cgroup.procs
# chown -R podman:podman /sys/fs/cgroup/
# rm -f /etc/containers/containers.conf
# runuser -u podman -- podman run --rm -ti --privileged -h nested registry.fedoraproject.org/fedora ls -ld /sys/fs/cgroup
drwxr-xr-x. 2 root root 0 Jun 17 10:19 /sys/fs/cgroup