containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.83k stars 2.42k forks source link

Invalid argument error at container startup after reboot #22576

Closed DavidePrincipi closed 6 months ago

DavidePrincipi commented 6 months ago

Issue Description

This issue occurs randomly after a system reboot. One or more services running in rootless or rootfull Podman containers fails to start with invalid argument error. As reported in #21274, it looks different from those in the Troubleshooting Guide.

Steps to reproduce the issue

Not always reproducible, please see the results.

Describe the results you received

In the system journal we find messages like this

May 02 09:51:29 R1-pve.rocky9-pve.org podman[6068]: time="2024-05-02T09:51:29+02:00" level=warning msg="Unmounting container \"samba-dc\" while attempting to delete storage: unmounting \"/home/samba1/.local/share/containers/storage/overlay/c07c970255101ffff8fb38162c32beb7ef7884f8d>
May 02 09:51:29 R1-pve.rocky9-pve.org podman[6068]: Error: removing storage for container "samba-dc": unmounting "/home/samba1/.local/share/containers/storage/overlay/c07c970255101ffff8fb38162c32beb7ef7884f8d1a12f4cdd0da18adc1c4873/merged": invalid argument

Following instructions here https://github.com/containers/podman/issues/21274#issuecomment-1898275626, we got this strace output (for another container):

https://gist.github.com/stephdl/95afc3c028ebdd3d11d3014cd7efea81

Describe the results you expected

The container should start instead.

podman info output

[openldap1@r3-pve state]$ rpm -q podman 
podman-4.6.1-8.el9_3.x86_64

[openldap1@r3-pve state]$ podman version
Client:       Podman Engine
Version:      4.6.1
API Version:  4.6.1
Go Version:   go1.20.12
Built:        Wed Mar  6 11:08:41 2024
OS/Arch:      linux/amd64

[openldap1@r3-pve state]$ podman info
host:
  arch: amd64
  buildahVersion: 1.31.3
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.8-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: cebaba63f66de0e92cdc7e2a59f39c9208281158'
  cpuUtilization:
    idlePercent: 98.96
    systemPercent: 0.32
    userPercent: 0.72
  cpus: 8
  databaseBackend: boltdb
  distribution:
    distribution: '"rocky"'
    version: "9.3"
  eventLogger: file
  freeLocks: 2047
  hostname: r3-pve.rocky9-pve3.org
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1003
      size: 1
    - container_id: 1
      host_id: 296608
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1003
      size: 1
    - container_id: 1
      host_id: 296608
      size: 65536
  kernel: 5.14.0-362.24.1.el9_3.0.1.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 5071155200
  memTotal: 8057634816
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.7.0-1.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.7.0
    package: netavark-1.7.0-2.el9_3.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.7.0
  ociRuntime:
    name: crun
    package: crun-1.8.7-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.7
      commit: 53a9996ce82d1ee818349bdcc64797a1fa0433c4
      rundir: /run/user/1003/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    path: /run/user/1003/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.1-1.el9.x86_64
    version: |-
      slirp4netns version 1.2.1
      commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 6874460160
  swapTotal: 6874460160
  uptime: 0h 13m 50.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/openldap1/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/openldap1/.local/share/containers/storage
  graphRootAllocated: 19925041152
  graphRootUsed: 6092189696
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/user/1003/containers
  transientStore: false
  volumePath: /home/openldap1/.local/share/containers/storage/volumes
version:
  APIVersion: 4.6.1
  Built: 1709719721
  BuiltTime: Wed Mar  6 11:08:41 2024
  GitCommit: ""
  GoVersion: go1.20.12
  Os: linux
  OsArch: linux/amd64
  Version: 4.6.1

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

No

Additional environment details

Virtual machine under Proxmox.

Additional information

Issue appears at random times. Some systems (maybe slower than others) seems to hit it more frequently.

Bug reported here https://github.com/NethServer/dev/issues/6916

After applying this workaround the container starts

https://github.com/containers/podman/issues/19491#issuecomment-1668129950

Luap99 commented 6 months ago

This works on newer versions

DavidePrincipi commented 6 months ago

Could you help me to find the commit that fixes the issue? I need to track it both on Rocky Linux 9 and Debian 12.

Luap99 commented 6 months ago

you linked to it already https://github.com/containers/storage/pull/1687

DavidePrincipi commented 6 months ago

Thank you for confirming it!

IIUC, the fix appeared in containers/storage 1.49 that was then added to Podman 4.7

https://github.com/containers/podman/commit/e092f887fe3b254a2d8919feba3a59681916d6f7