containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.24k stars 2.37k forks source link

Unable to configure cgroup limits #22147

Closed eetiez closed 5 months ago

eetiez commented 6 months ago

Issue Description

While I was launching a rootless container with a non-root process trying to set cpu limits via cgroupfs, I noticed that the process failed to start with the following error :

write error: No such file or directory

I reproduced the steps manually and I can't find any way to configure cpu limits in a cgroup child via cgroupfs inside a podman rootless container. The /sys/fs/cgroup is writable.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Be sure that SELinux does not prevent containers to manage cgroups :
    $ sudo setsebool -P container_manage_cgroup true
  2. Start a rootless container with /sys/fs/cgroup writable :
    $ podman run -it --rm --security-opt unmask=/sys/fs/cgroup ubuntu:22.04 bash
  3. Inside the container, create a non-root user called test :
    root@xxxxxxxxxxxx:/# useradd test -s /bin/bash
  4. Inside the container, create a cgroup child and chown the directory to the test user :
    root@xxxxxxxxxxxx:/# mkdir /sys/fs/cgroup/test && chown -R test:test /sys/fs/cgroup/test
  5. Inside the container, open a shell as test user
    root@xxxxxxxxxxxx:/# su test
  6. Inside the container, try to activate cpu controller for the cgroup child
    test@xxxxxxxxxxxx:/# echo "+cpu" >> /sys/fs/cgroup/test/cgroup.subtree_control

Describe the results you received

The steps above produce the following error :

bash: echo: write error: No such file or directory

Describe the results you expected

It should add the cpu controller to the /sys/fs/cgroup/test/cgroup.subtree_control

podman info output

host:
  arch: amd64
  buildahVersion: 1.33.5
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.fc39.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
  cpuUtilization:
    idlePercent: 91.84
    systemPercent: 1.62
    userPercent: 6.54
  cpus: 4
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: workstation
    version: "39"
  eventLogger: journald
  freeLocks: 2048
  hostname: fedora
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 6.7.9-200.fc39.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 6939414528
  memTotal: 12452999168
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.10.0-1.fc39.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.10.0
    package: netavark-1.10.3-1.fc39.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.10.3
  ociRuntime:
    name: crun
    package: crun-1.14.4-1.fc39.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.14.4
      commit: a220ca661ce078f2c37b38c92e66cf66c012d9c1
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240220.g1e6f92b-1.fc39.x86_64
    version: |
      pasta 0^20240220.g1e6f92b-1.fc39.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-1.fc39.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 8589930496
  swapTotal: 8589930496
  uptime: 1h 9m 28.00s (Approximately 0.04 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/user/.local/share/containers/storage
  graphRootAllocated: 118312927232
  graphRootUsed: 12345868288
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 4.9.3
  Built: 1708357294
  BuiltTime: Mon Feb 19 16:41:34 2024
  GitCommit: ""
  GoVersion: go1.21.7
  Os: linux
  OsArch: linux/amd64
  Version: 4.9.3

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

No response

Additional information

The steps above produce the same error when the cgroup child is owned by root user and the command to add controller executed by root user.

giuseppe commented 6 months ago

a rootless user cannot configure that. You need to configure your system to delegate the cpu controller to an unprivileged user.

You need something like:

# cat > /etc/systemd/system/user@.service.d/delegate-cgroups.conf << EOF
[Service]
Delegate=cpu cpuset io memory pids
EOF
eetiez commented 6 months ago

Thanks for the answer. I'm however still unable to delegate the cpu controller to the user running the container.

Based on your answer, I tried to reproduce the container execution via systemd discussed here. I noticed that the file /sys/fs/cgroup/cgroup.subtree_control is empty which could explain why I am unable to add cpu controller to a cgroup child. Adding cpu via echo "+cpu" >> /sys/fs/cgroup/cgroup.subtree_control works but it doesn't help in a cgroup child as the same error discussed above occurred...

eetiez commented 5 months ago

My mistake, everything was finally working well. I wasn't looking in the right file : you must check controller delegation in /sys/fs/cgroup/cgroup.controllers. All the needed controllers are available inside the container with podman default configuration on Fedora 39. The errors mentioned above were due to a wrong manipulation of cgroup v2 : you cannot delegate controller to child cgroup if there are processes inside the root cgroup (check cgroup.procs). So I close this issue.