I have a setup with multiple quadlet files setup to manage long term services, one shot jobs, pods and volumes. All of this is running on a CentOS 9 platform, with podman in transient mode and a separate filesystem for container storage.
This is running in a system where we can have unclean shutdowns quite frequently.
We've encountered a bug quite recently, where the system seems to hang indefinitely at boot, waiting on a pod/volume/oneshot container service from quadlet forever. Current workaround is to install appropriate timeouts, and have systemd restart the services in that case. This seem to happen after an unclean shutdown.
I have opened a PR that attempt to fix that issue: #22985
Steps to reproduce the issue
Install the quadlet files linked to the issue on the system in /etc/containers/systemd, reboot the system once and wait for all the services to be
Hard-Reboot the system (eg. reboot -f)
Login and run systemctl list-jobs to observe that either the pod or volume service are hanging the system
host:
arch: amd64
buildahVersion: 1.36.0
cgroupControllers:
- cpuset
- cpu
- io
- memory
- hugetlb
- pids
- rdma
- misc
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.1.12-1.el9.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.12, commit: 7ba5bd6c81ff2c10e07aee8c4281d12a2878fa12'
cpuUtilization:
idlePercent: 75.44
systemPercent: 5.62
userPercent: 18.93
cpus: 12
databaseBackend: sqlite
distribution:
distribution: centos
version: "9"
eventLogger: journald
freeLocks: 2031
hostname: HOSTNAME
idMappings:
gidmap: null
uidmap: null
kernel: 5.14.0-430.el9.x86_64
linkmode: dynamic
logDriver: journald
memFree: 6276349952
memTotal: 16339382272
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: aardvark-dns-1.9.0-1.el9.x86_64
path: /usr/libexec/podman/aardvark-dns
version: aardvark-dns 1.9.0
package: netavark-1.11.0-1.el9.x86_64
path: /usr/libexec/podman/netavark
version: netavark 1.11.0
ociRuntime:
name: crun
package: crun-1.15-1.el9.x86_64
path: /usr/bin/crun
version: |-
crun version 1.15
commit: e6eacaf4034e84185fd8780ac9262bbf57082278
rundir: /run/user/0/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
os: linux
pasta:
executable: /usr/bin/pasta
package: passt-0^20231204.gb86afe3-1.el9.x86_64
version: |
pasta 0^20231204.gb86afe3-1.el9.x86_64
Copyright Red Hat
GNU General Public License, version 2 or later
<https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
remoteSocket:
exists: true
path: /run/podman/podman.sock
rootlessNetworkCmd: pasta
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.3.1-1.el9.x86_64
version: |-
slirp4netns version 1.3.1
commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
libslirp: 4.4.0
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.2
swapFree: 8589930496
swapTotal: 8589930496
uptime: 3h 34m 39.00s (Approximately 0.12 days)
variant: ""
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- registry.access.redhat.com
- registry.redhat.io
- docker.io
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 8
paused: 0
running: 8
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev,metacopy=on
graphRoot: /var/lib/containers/storage
graphRootAllocated: 554240225280
graphRootUsed: 18993541120
graphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "false"
Supports d_type: "true"
Supports shifting: "false"
Supports volatile: "true"
Using metacopy: "true"
imageCopyTmpDir: /var/tmp
imageStore:
number: 15
runRoot: /run/containers/storage
transientStore: false
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 5.1.0
Built: 1717411100
BuiltTime: Mon Jun 3 10:38:20 2024
GitCommit: ""
GoVersion: go1.22.3 (Red Hat 1.22.3-2.el9)
Os: linux
OsArch: linux/amd64
Version: 5.1.0
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
No
Additional environment details
Podman 5.1.0 in transient mode on a Centos 9 based, with a separate filesystem for the container storage in /var/lib/containers. Unclean shutdowns are frequent.
Issue Description
I have a setup with multiple quadlet files setup to manage long term services, one shot jobs, pods and volumes. All of this is running on a CentOS 9 platform, with podman in transient mode and a separate filesystem for container storage.
This is running in a system where we can have unclean shutdowns quite frequently.
We've encountered a bug quite recently, where the system seems to hang indefinitely at boot, waiting on a pod/volume/oneshot container service from quadlet forever. Current workaround is to install appropriate timeouts, and have systemd restart the services in that case. This seem to happen after an unclean shutdown.
I have opened a PR that attempt to fix that issue: #22985
Steps to reproduce the issue
/etc/containers/systemd
, reboot the system once and wait for all the services to bereboot -f
)systemctl list-jobs
to observe that either the pod or volume service are hanging the systemQuadlet files:
Describe the results you received
System hangs forever during the boot phase
Describe the results you expected
Boot completes without hanging
podman info output
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
No
Additional environment details
Podman 5.1.0 in transient mode on a Centos 9 based, with a separate filesystem for the container storage in
/var/lib/containers
. Unclean shutdowns are frequent.Additional information
No response