containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.71k stars 2.41k forks source link

Floating bug: podman + systemd + daemon-reload #19285

Closed AlexandrClick closed 1 year ago

AlexandrClick commented 1 year ago

Issue Description

Describe your issue

Hello, we have around 300 systemd units that launch certain tasks in Podman containers. Sometimes, when we perform a daemon-reload, we encounter containers in a status of "Exited." The issue is that the --replace option doesn't work, and all subsequent attempts to start the containers immediately crash with the log message "Main process exited, code=killed, status=9/KILL." This can be resolved by manually removing the dead container.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Having multiple systemd units(~300) launching containers in Podman.
  2. Perform systemctl daemon-reload.

Describe the results you received

Describe the results you received

Multiple containers in the "Exited" state. image And they cannot be started. image

Describe the results you expected

Describe the results you expected

Nothing bad happened.

podman info output

host:
  arch: amd64
  buildahVersion: 1.28.0
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon_100:2.1.2~0_amd64
    path: /usr/libexec/podman/conmon
    version: 'conmon version 2.1.2, commit: '
  cpuUtilization:
    idlePercent: 92.92
    systemPercent: 2.12
    userPercent: 4.96
  cpus: 64
  distribution:
    codename: focal
    distribution: ubuntu
    version: "20.04"
  eventLogger: journald
  hostname: php-2-eu.adsrv.wtf
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.0-53-generic
  linkmode: dynamic
  logDriver: journald
  memFree: 44921167872
  memTotal: 134814973952
  networkBackend: cni
  ociRuntime:
    name: crun
    package: crun_100:1.2-2_amd64
    path: /usr/bin/crun
    version: |-
      crun version UNKNOWN
      commit: ea1fe3938eefa14eb707f1d22adff4db670645d6
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: true
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 5735h 44m 37.00s (Approximately 238.96 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 43
    paused: 0
    running: 42
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 982723387392
  graphRootUsed: 551450161152
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 17
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.3.1
  Built: 1669802019
  BuiltTime: Wed Nov 30 04:53:39 2022
  GitCommit: 814b7b003cc630bf6ab188274706c383f9fb9915-dirty
  GoVersion: go1.19.3
  Os: linux
  OsArch: linux/amd64
  Version: 4.3.1

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

No

Additional environment details

Additional environment details OS: 5.15.0-53-generic #59~20.04.1-Ubuntu SMP Thu Oct 20 15:10:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux Typical unit image

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

Luap99 commented 1 year ago

The issue is that the --replace option doesn't work, and all subsequent attempts to start the containers immediately crash with the log message "Main process exited, code=killed, status=9/KILL." This can be resolved by manually removing the dead container.

Please be specific what does --replace does not work mean? What is the actual podman error message. Looks like something is killing the processes but it is not clear why.

Based on your output Exited (0) usually means the container exited successfully on its own.

Lastly Type=oneshot is not a proper unit type for podman, you should stick to units generated with podman generate systemd or quadlet. There are many pitfalls when using podman in systemd so it is best to stick what we know works, otherwise you are on your own.

Also systemctl daemon-reload on its own shouldn't do anything to units. It just reloads them if they have been changed but it will not start/stop anything so I don't see how this should be directly related to the issue.


ps: please just copy and paste the output into code blocks. Screen shots are harder to read and cannot be indexed so we cannot search for it.