containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.83k stars 2.42k forks source link

Podman seems to trigger locking problems in xfs, locking up the whole machine #18610

Closed oseiberts11 closed 1 year ago

oseiberts11 commented 1 year ago

Issue Description

In https://github.com/containers/podman/issues/16062#issuecomment-1550239180 I was asked to report a new issue here; mainly to eliminate it as a bug in podman itself.

I will try to squeeze the information I have into this template.

https://github.com/containers/podman/issues/16062#issuecomment-1334397333 considers a locking bug inside podman. We seem to have problems regarding a deadlock in the xfs file system, triggered by invoking podman. I am wondering if these can be explained by the same explanation or not (but it smells like a kernel bug).

The issue occurs when starting a container for RabbitMQ with --user rabbitmq --userns=keep-id, on newer kernels (we noticed when updating Ubuntu from Bionic 18.04 to Focal 20.04) when the native overlayfs is used and not fuse-overlayfs.

One thing that is sub-optimal is that RabbitMQ needs its home directory mounted in the container (/var/lib/rabbitmq) but this is also where Podman stores all the container files; so effectively the container files are mounted into the container.

Our current workaround is adding --storage-opt "overlay.mount_program=/usr/bin/fuse-overlayfs".

The user rabbit in question has /var/lib/rabbitmq as its home directory, and it is also bound into the container. Its UID differs from the UID of the user rabbitmq which may have been installed by a rabbitmq package. This may or may not be a factor in the bug.

The version of Podman we have is fairly old (3.4.2) but Ubuntu doesn't seem to have packaged a newer version for Ubuntu 20.04. Therefore we could not try so far if updating podman helps us.

Steps to reproduce the issue

Steps to reproduce the issue systemd start rabbitmq with the following systemd unit file:

[Unit]
Description=RabbitMQ Messaging Server
After=network.target network-online.target
Wants=network.target network-online.target

[Service]
Environment=PODMAN_SYSTEMD_UNIT=%n
Restart=on-failure
User=rabbit
Group=rabbit
TimeoutStartSec=120
ExecStartPre=-/usr/bin/timeout 30 /usr/bin/podman pull registry.syseleven.de/openstack/rabbitmq:jammy
ExecStart=/usr/bin/podman run --rm --network host \
           --conmon-pidfile /%t/%n/%n.pid --cidfile /%t/%n/%n.cid \
           --name %n \
           --mount type=bind,source=/etc/rabbitmq,target=/etc/rabbitmq \
           --mount type=bind,source=/etc/ssl,target=/etc/ssl \
           --mount type=bind,source=/etc/resolv.conf,target=/etc/resolv.conf \
           --mount type=bind,source=/var/lib/rabbitmq,target=/var/lib/rabbitmq \
           --mount type=bind,source=/var/log/rabbitmq,target=/var/log/rabbitmq \
           --user rabbitmq --userns=keep-id \
           registry.syseleven.de/openstack/rabbitmq:jammy /usr/sbin/rabbitmq-server

ExecStop=/usr/bin/podman exec %n /usr/sbin/rabbitmqctl stop
ExecStopPost=/usr/bin/podman rm --ignore -f %n
PIDFile=/%t/%n/%n.pid
RestartSec=60s
SyslogIdentifier=rabbitmq
LimitNOFILE=65536
RuntimeDirectory=%n

[Install]
WantedBy=multi-user.target

We pull a rabbitmq image from our own container registry, but since the problem occurs before the container even really starts, I think it doesn't matter too much what exactly is in the image. But it was made with this Dockerfile:

FROM ubuntu:focal

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get -y install rabbitmq-server

COPY rabbitmq-script-wrapper /usr/lib/rabbitmq/bin/rabbitmq-script-wrapper

RUN chmod +x /usr/lib/rabbitmq/bin/rabbitmq-script-wrapper

Describe the results you received

The machine became unusable, processes got stuck, logins became impossible.

According to the kernel log file, processes got stuck in xfs file system code. Here is the first such process, but more followed after this.

The kernel warns about processes being stuck; here is one of them:

2023-05-10 15:37:39 in4 kernel[-] warning: [   69.953197] overlayfs: fs on '/var/lib/rabbitmq/.local/share/containers/storage/overlay/compat278498705/lower1' does not support file handles, falling back to xino=off.
2023-05-10 15:37:40 in4 kernel[-] warning: [   69.975395] overlayfs: fs on '/var/lib/rabbitmq/.local/share/containers/storage/overlay/metacopy-check184881131/l1' does not support file handles, falling back to xino=off.
2023-05-10 15:37:40 in4 kernel[-] warning: [   69.980772] overlayfs: fs on '/var/lib/rabbitmq/.local/share/containers/storage/overlay/opaque-bug-check964911957/l2' does not support file handles, falling back to xino=off.
2023-05-10 15:37:44 in4 kernel[-] warning: [   74.073926] overlayfs: fs on '/var/lib/rabbitmq/.local/share/containers/storage/overlay/l/PJRXH6F3G7CISWKDG472AW6PUO' does not support file handles, falling back to xino=off.
2023-05-10 15:40:35 in4 kernel[-] err: [  245.356952] INFO: task kworker/9:0:73 blocked for more than 120 seconds.
2023-05-10 15:40:35 in4 kernel[-] err: [  245.363676]       Not tainted 5.15.0-71-generic #78~20.04.1-Ubuntu
2023-05-10 15:40:35 in4 kernel[-] err: [  245.369880] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2023-05-10 15:40:35 in4 kernel[-] info: [  245.377744] task:kworker/9:0     state:D stack:    0 pid:   73 ppid:     2 flags:0x00004000
2023-05-10 15:40:35 in4 kernel[-] info: [  245.377748] Workqueue: xfs-conv/dm-1 xfs_end_io [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.377841] Call Trace:
2023-05-10 15:40:35 in4 kernel[-] info: [  245.377844]  <TASK>
2023-05-10 15:40:35 in4 kernel[-] info: [  245.377846]  __schedule+0x2cd/0x890
2023-05-10 15:40:35 in4 kernel[-] info: [  245.377850]  ? xfs_buf_find.isra.0+0x331/0x780 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.377915]  schedule+0x69/0x110
2023-05-10 15:40:35 in4 kernel[-] info: [  245.377917]  schedule_timeout+0x206/0x2d0
2023-05-10 15:40:35 in4 kernel[-] info: [  245.377919]  ? xfs_buf_get_map+0x52/0x420 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.377983]  __down+0x84/0xf0
2023-05-10 15:40:35 in4 kernel[-] info: [  245.377985]  down+0x53/0x70
2023-05-10 15:40:35 in4 kernel[-] info: [  245.377987]  xfs_buf_lock+0x32/0xc0 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378051]  xfs_buf_find.isra.0+0x34a/0x780 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378116]  xfs_buf_get_map+0x52/0x420 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378179]  xfs_buf_read_map+0x53/0x350 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378243]  ? xfs_read_agf+0xa6/0x130 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378295]  xfs_trans_read_buf_map+0x130/0x2a0 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378368]  ? xfs_read_agf+0xa6/0x130 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378421]  xfs_read_agf+0xa6/0x130 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378473]  xfs_alloc_read_agf+0x39/0x1b0 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378523]  ? xfs_perag_get+0x45/0xa0 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378574]  xfs_refcount_finish_one+0xee/0x350 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378636]  xfs_refcount_update_finish_item+0x4a/0xb0 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378706]  ? xfs_trans_get_cud+0x62/0x70 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378777]  xfs_defer_finish_noroll+0x1f4/0x600 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378835]  __xfs_trans_commit+0x17d/0x3b0 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378905]  xfs_trans_commit+0x10/0x20 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.378974]  xfs_reflink_end_cow_extent+0x230/0x2a0 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.379043]  xfs_reflink_end_cow+0x7c/0x130 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.379112]  xfs_end_ioend+0xfc/0x190 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.379185]  xfs_end_io+0xb0/0xe0 [xfs]
2023-05-10 15:40:35 in4 kernel[-] info: [  245.379244]  process_one_work+0x228/0x3d0
2023-05-10 15:40:35 in4 kernel[-] info: [  245.379246]  worker_thread+0x4d/0x3f0
2023-05-10 15:40:35 in4 kernel[-] info: [  245.379248]  ? process_one_work+0x3d0/0x3d0
2023-05-10 15:40:35 in4 kernel[-] info: [  245.379250]  kthread+0x127/0x150
2023-05-10 15:40:35 in4 kernel[-] info: [  245.379252]  ? set_kthread_struct+0x50/0x50
2023-05-10 15:40:35 in4 kernel[-] info: [  245.379254]  ret_from_fork+0x1f/0x30
2023-05-10 15:40:35 in4 kernel[-] info: [  245.379258]  </TASK>

After this, more and more processes get blocked as they try to access the file system. This makes the whole machine unusable.

Describe the results you expected

I expected the program in the container image to start, but it never got to that point.

podman info output

$ sudo -H -u  rabbit podman info
WARN[0000] Error validating CNI config file /var/lib/rabbitmq/.config/cni/net.d/87-podman.conflist: [failed to find plugin "bridge" in path [/usr/local/libexec/cni /usr/libexec/cni /usr/local/lib/cni /usr/lib/cni /opt/cni/bin] failed to find plugin "portmap" in path [/usr/local/libexec/cni /usr/libexec/cni /usr/local/lib/cni /usr/lib/cni /opt/cni/bin] failed to find plugin "firewall" in path [/usr/local/libexec/cni /usr/libexec/cni /usr/local/lib/cni /usr/lib/cni /opt/cni/bin] failed to find plugin "tuning" in path [/usr/local/libexec/cni /usr/libexec/cni /usr/local/lib/cni /usr/lib/cni /opt/cni/bin]]
WARN[0000] Error validating CNI config file /var/lib/rabbitmq/.config/cni/net.d/87-podman.conflist: [failed to find plugin "bridge" in path [/usr/local/libexec/cni /usr/libexec/cni /usr/local/lib/cni /usr/lib/cni /opt/cni/bin] failed to find plugin "portmap" in path [/usr/local/libexec/cni /usr/libexec/cni /usr/local/lib/cni /usr/lib/cni /opt/cni/bin] failed to find plugin "firewall" in path [/usr/local/libexec/cni /usr/libexec/cni /usr/local/lib/cni /usr/lib/cni /opt/cni/bin] failed to find plugin "tuning" in path [/usr/local/libexec/cni /usr/libexec/cni /usr/local/lib/cni /usr/lib/cni /opt/cni/bin]]
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: 'conmon: /usr/libexec/podman/conmon'
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.1.2, commit: '
  cpus: 48
  distribution:
    codename: focal
    distribution: ubuntu
    version: "20.04"
  eventLogger: journald
  hostname: infrafe300334
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1070
      size: 1
    - container_id: 1
      host_id: 300000
      size: 65534
    uidmap:
    - container_id: 0
      host_id: 1070
      size: 1
    - container_id: 1
      host_id: 300000
      size: 65534
  kernel: 5.15.0-71-generic
  linkmode: dynamic
  logDriver: journald
  memFree: 191597338624
  memTotal: 270065315840
  ociRuntime:
    name: crun
    package: 'crun: /usr/bin/crun'
    path: /usr/bin/crun
    version: |-
      crun version UNKNOWN
      commit: ea1fe3938eefa14eb707f1d22adff4db670645d6
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /tmp/podman-run-1070/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 150h 30m 50.65s (Approximately 6.25 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  docker.io:
    Blocked: false
    Insecure: false
    Location: cbk130441.cbk.sys11cloud.net:6000/docker.io
    MirrorByDigestOnly: false
    Mirrors:
    - Insecure: false
      Location: xxxxxxxxxxxxxxxxxd.net:6000/docker.io
    Prefix: docker.io
  quay.io:
    Blocked: false
    Insecure: false
    Location: xxxxxxxxxxxxxxxxxxd.net:6000/quay.io
    MirrorByDigestOnly: false
    Mirrors:
    - Insecure: false
      Location: xxxxxxxxxxxxxxxxxxd.net:6000/quay.io
    Prefix: quay.io
  registry.syseleven.de/openstack:
    Blocked: false
    Insecure: false
    Location: xxxxxxxxxxxxxxxxd.net:6000/registry.syseleven.de/openstack
    MirrorByDigestOnly: false
    Mirrors:
    - Insecure: false
      Location: xxxxxxxxxxxxxxxxxx.net:6000/registry.syseleven.de/openstack
    Prefix: registry.syseleven.de/openstack
store:
  configFile: /var/lib/rabbitmq/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: 'fuse-overlayfs: /usr/bin/fuse-overlayfs'
      Version: |-
        fusermount3 version: 3.9.0
        fuse-overlayfs: version 1.5
        FUSE library version 3.9.0
        using FUSE kernel interface version 7.31
  graphRoot: /var/lib/rabbitmq/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 1
  runRoot: /tmp/podman-run-1070/containers
  volumePath: /var/lib/rabbitmq/.local/share/containers/storage/volumes
version:
  APIVersion: 3.4.2
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.15.2
  OsArch: linux/amd64
  Version: 3.4.2

Possibly it shows `fuse-overlayfs` because in the currently actually system, we have added our workaround. If I run the command as another unpriv'ed user, I get for that part

store:
  configFile: /home/oseibert/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/oseibert/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 0
  runRoot: /run/user/1051/containers
  volumePath: /home/oseibert/.local/share/containers/storage/volumes

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

This is on real iron.

Additional information

No response

Luap99 commented 1 year ago

This looks outside of our control, in any case please test this with the latest versions (kernel and podman).

oseiberts11 commented 1 year ago

I tried to reproduce it in a VM (strategy: first reproduce with the same versions, then later update things to see that it works again), but this strategy failed. Unfortunately the only Linux VM images I have available are based on an ext4 file system, and with that the issue did not reproduce. I tried adding an extra virtual disk, formatted with xfs, and using that as much as possible (mounted as the home directory of the user), but it still did not reproduce. Does anyone know the best place to report xfs and/or unionfs trouble? Maybe the stack trace I reported is enough to tell somebody what/where the problem is.

rhatdan commented 1 year ago

I would report them to the linux kernel as a bugzilla.

Luap99 commented 1 year ago

That depends on whenever the xfs maintainers even use the kernel bugzilla. You should refer to https://docs.kernel.org/admin-guide/reporting-issues.html. Either way I think it is very likely that you have to test the latest kernel.