containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
22.37k stars 2.31k forks source link

Rootless fails to receive systemd slice shortly after boot #23026

Closed Gelbpunkt closed 2 weeks ago

Gelbpunkt commented 2 weeks ago

Issue Description

For a few months already (I really cannot remember when this started), I observe the following when starting rootless containers (in my case via systemd units, but I can reproduce it when running the commands manually):

Error: did not receive systemd slice as cgroup parent when using systemd to manage cgroups: invalid argument

This, however, does not happen for the first couple of systemd unit-provisioned containers. Usually, 90% of my containers (I think there are about 20?) come up fine, and then this starts happening. Rootful containers are not affected and adding --cgroup-manager=cgroupfs makes it "work", but I would still consider this a bug. This might be caused by systemd, but I have never really touched anything related to it on this server.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Have a system where this happens, sadly I really don't know what exactly causes it
  2. Boot the machine
  3. Wait a while
  4. Run a container under a rootless user without setting --cgroup-manager=cgroupfs

Describe the results you received

The containers fail to start

Describe the results you expected

The containers should start fine

podman info output

host:
  arch: amd64
  buildahVersion: 1.36.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
  cpuUtilization:
    idlePercent: 95.97
    systemPercent: 1.37
    userPercent: 2.66
  cpus: 48
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: server
    version: "40"
  eventLogger: journald
  freeLocks: 2037
  hostname: syndra
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1004
      size: 1
    - container_id: 1
      host_id: 589824
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1004
      size: 1
    - container_id: 1
      host_id: 589824
      size: 65536
  kernel: 6.8.11-300.fc40.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 121558302720
  memTotal: 236279029760
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.11.0-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.11.0
    package: netavark-1.11.0-1.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.11.0
  ociRuntime:
    name: crun
    package: crun-1.15-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/user/1004/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240510.g7288448-1.fc40.x86_64
    version: |
      pasta 0^20240510.g7288448-1.fc40.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1004/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 94h 11m 15.00s (Approximately 3.92 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
store:
  configFile: /home/glitch/.config/containers/storage.conf
  containerStore:
    number: 8
    paused: 0
    running: 8
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/glitch/.local/share/containers/storage
  graphRootAllocated: 2000262529024
  graphRootUsed: 1068284645376
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 15
  runRoot: /run/user/1004/containers
  transientStore: false
  volumePath: /home/glitch/.local/share/containers/storage/volumes
version:
  APIVersion: 5.1.0
  Built: 1716940800
  BuiltTime: Wed May 29 02:00:00 2024
  GitCommit: ""
  GoVersion: go1.22.3
  Os: linux
  OsArch: linux/amd64
  Version: 5.1.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

This is a physical server running a Fedora 40 install that was upgraded from I think 36 (?) over the last few years. The machine is pretty much only used for running containers and otherwise a fairly stock Fedora install.

Additional information

I would be very happy to provide SSH access or any other means to help debug this issue to podman maintainers. You can email me e2e-encrypted with my GPG key at the email in my github profile or message me on Matrix, where I am @gelbpunkt:matrix.org.

giuseppe commented 2 weeks ago

please provide the full command you are running.

That error happens with --cgroup-parent=$PARENT and $PARENT is not a systemd slice (i.e. it doesn't have a .slice suffix).

$ podman run --rm --cgroup-parent foo alpine true
Error: did not receive systemd slice as cgroup parent when using systemd to manage cgroups: invalid argument
Gelbpunkt commented 2 weeks ago

please provide the full command you are running.

That error happens with --cgroup-parent=$PARENT and $PARENT is not a systemd slice (i.e. it doesn't have a .slice suffix).

$ podman run --rm --cgroup-parent foo alpine true
Error: did not receive systemd slice as cgroup parent when using systemd to manage cgroups: invalid argument

I can reproduce it with a simple podman run --rm -it alpine:edge Incorrect, see next comment

Gelbpunkt commented 2 weeks ago

please provide the full command you are running. That error happens with --cgroup-parent=$PARENT and $PARENT is not a systemd slice (i.e. it doesn't have a .slice suffix).

$ podman run --rm --cgroup-parent foo alpine true
Error: did not receive systemd slice as cgroup parent when using systemd to manage cgroups: invalid argument

I can reproduce it with a simple podman run --rm -it alpine:edge

Actually, no. Only in one pod of mine it seems to occur. I had just rebooted after making this post so my last comment was from memory, but I was able to check again now since it started occuring again:

[glitch@syndra ~]$ podman run --rm -it alpine:edge ash
/ #
[glitch@syndra ~]$ podman run --rm -it --pod glitch alpine:edge ash
Error: did not receive systemd slice as cgroup parent when using systemd to manage cgroups: invalid argument

Edit: Creating a new pod now and then running a container with --pod test works. Only this one pod does not work as expected.

[jens@syndra ~]$ sudo cat /etc/systemd/system/glitch-pod.service
[Unit]
Description=Create glitch pod

[Service]
Type=oneshot
User=glitch
Group=glitch
ExecStartPre=-/usr/bin/podman pod stop glitch
ExecStartPre=-/usr/bin/podman pod rm -f glitch
ExecStart=/usr/bin/podman pod create --name glitch -p 127.0.0.1:8080:80 -p 127.0.0.1:4443:443 -p 29418:29418
ExecStop=/usr/bin/podman pod rm -f glitch
ExecReload=/usr/bin/podman pod stop glitch
Restart=no
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

This is how the pod in question is created.

giuseppe commented 2 weeks ago

[jens@syndra ~]$ sudo cat /etc/systemd/system/glitch-pod.service [Unit] Description=Create glitch pod

[Service] Type=oneshot User=glitch Group=glitch ExecStartPre=-/usr/bin/podman pod stop glitch ExecStartPre=-/usr/bin/podman pod rm -f glitch ExecStart=/usr/bin/podman pod create --name glitch -p 127.0.0.1:8080:80 -p 127.0.0.1:4443:443 -p 29418:29418 ExecStop=/usr/bin/podman pod rm -f glitch ExecReload=/usr/bin/podman pod stop glitch Restart=no RemainAfterExit=yes

[Install] WantedBy=multi-user.target

if you run with User= there is no systemd user session active for that user, so it ends up using the cgroupfs backend.

You can install the .service file under $HOME/.config/systemd/user/glitch-pod.service and avoid the User= attribute

Gelbpunkt commented 2 weeks ago

if you run with User= there is no systemd user session active for that user, so it ends up using the cgroupfs backend.

Ahh, thanks so much for this hint! I'll move to user units then :)