Closed srcshelton closed 1 year ago
looks like it takes a while to delete the btrfs volume.
Can you try with the overlay driver?
It's been like this since before I switched to the btrfs
driver - if anything, it was worse with the overlay
driver (although I didn't gather any timing on this).
(Is there any way to switch drivers without having to podman system reset
and save/restore all images?)
Or, is there any instrumentation I can add/debug options I can enable that will get the innards of podman
/libpod
to tell us how much time is taken in removing the btrfs
subvolume, to rule this in or out?
I believe that subvolume removal should be instant, and whenever I've removed subvolumes manually on the same system, there's never been any noticeable delay.
you could experiment with overlay on a different store:
podman --storage-driver overlay --root /somewhere/else ...
and see how that performs.
You cannot switch drivers without recreating the images, what you could try is to copy them:
$ podman --root /new/storage pull "containers-storage:[btrfs@/old/path+/run/dir]IMAGE"
The 20 seconds is the delay uses to wait for the exit code to appear in the db. You can see that with --log-level debug
.
So the real problem is is likely the podman container cleanup command not working correctly or something wrong in conmon maybe.
The 20 seconds is the delay uses to wait for the exit code to appear in the db. You can see that with
--log-level debug
.So the real problem is is likely the podman container cleanup command not working correctly or something wrong in conmon maybe.
Setup:
$ sudo podman run -it --rm --log-level debug docker.io/alpine:latest ; date
INFO[0000] podman filtering at log level debug
DEBU[0000] Called run.PersistentPreRunE(podman run -it --rm --log-level debug docker.io/alpine:latest)
DEBU[0000] Using conmon: "/usr/libexec/podman/conmon"
DEBU[0000] Using graph driver btrfs
DEBU[0000] Using graph root /mnt/podman-storage
DEBU[0000] Using run root /var/run/podman
DEBU[0000] Using static dir /space/podman/static
DEBU[0000] Using tmp dir /var/run/libpod
DEBU[0000] Using volume path /space/podman/volumes
DEBU[0000] Using transient store: false
DEBU[0000] [graphdriver] trying provided driver "btrfs"
DEBU[0000] Initializing event backend file
DEBU[0000] Configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument
DEBU[0000] Configured OCI runtime ocijail initialization failed: no valid executable found for OCI runtime ocijail: invalid argument
DEBU[0000] Configured OCI runtime crun-wasm initialization failed: no valid executable found for OCI runtime crun-wasm: invalid argument
DEBU[0000] Configured OCI runtime runj initialization failed: no valid executable found for OCI runtime runj: invalid argument
DEBU[0000] Configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument
DEBU[0000] Configured OCI runtime youki initialization failed: no valid executable found for OCI runtime youki: invalid argument
DEBU[0000] Configured OCI runtime krun initialization failed: no valid executable found for OCI runtime krun: invalid argument
DEBU[0000] Using OCI runtime "/usr/bin/crun"
INFO[0000] Setting parallel job count to 25
DEBU[0000] Successfully loaded 1 networks
DEBU[0000] Pulling image docker.io/alpine:latest (policy: missing)
DEBU[0000] Looking up image "docker.io/alpine:latest" in local containers storage
DEBU[0000] Normalized platform linux/amd64 to {amd64 linux [] }
DEBU[0000] Trying "docker.io/library/alpine:latest" ...
DEBU[0000] parsed reference into "[btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb"
DEBU[0000] Found image "docker.io/alpine:latest" as "docker.io/library/alpine:latest" in local containers storage
DEBU[0000] Found image "docker.io/alpine:latest" as "docker.io/library/alpine:latest" in local containers storage ([btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb)
DEBU[0000] exporting opaque data as blob "sha256:7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb"
DEBU[0000] Looking up image "docker.io/library/alpine:latest" in local containers storage
DEBU[0000] Normalized platform linux/amd64 to {amd64 linux [] }
DEBU[0000] Trying "docker.io/library/alpine:latest" ...
DEBU[0000] parsed reference into "[btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb"
DEBU[0000] Found image "docker.io/library/alpine:latest" as "docker.io/library/alpine:latest" in local containers storage
DEBU[0000] Found image "docker.io/library/alpine:latest" as "docker.io/library/alpine:latest" in local containers storage ([btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb)
DEBU[0000] exporting opaque data as blob "sha256:7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb"
DEBU[0000] Looking up image "docker.io/alpine:latest" in local containers storage
DEBU[0000] Normalized platform linux/amd64 to {amd64 linux [] }
DEBU[0000] Trying "docker.io/library/alpine:latest" ...
DEBU[0000] parsed reference into "[btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb"
DEBU[0000] Found image "docker.io/alpine:latest" as "docker.io/library/alpine:latest" in local containers storage
DEBU[0000] Found image "docker.io/alpine:latest" as "docker.io/library/alpine:latest" in local containers storage ([btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb)
DEBU[0000] exporting opaque data as blob "sha256:7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb"
DEBU[0000] Inspecting image 7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb
DEBU[0000] exporting opaque data as blob "sha256:7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb"
DEBU[0000] exporting opaque data as blob "sha256:7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb"
DEBU[0000] Inspecting image 7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb
DEBU[0000] Inspecting image 7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb
DEBU[0000] Inspecting image 7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb
DEBU[0000] Inspecting image 7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb
DEBU[0000] using systemd mode: false
DEBU[0000] No hostname set; container's hostname will default to runtime default
DEBU[0000] Loading seccomp profile from "/usr/share/containers/seccomp.json"
DEBU[0000] Allocated lock 27 for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938
DEBU[0000] parsed reference into "[btrfs@/mnt/podman-storage+/var/run/podman]@7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb"
DEBU[0000] exporting opaque data as blob "sha256:7e01a0d0a1dcd9e539f8e9bbd80106d59efbdf97293b3d38f5d7a34501526cdb"
DEBU[0000] Created container "509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938"
DEBU[0000] Container "509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938" has work directory "/mnt/podman-storage/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata"
DEBU[0000] Container "509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938" has run directory "/var/run/podman/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata"
DEBU[0000] Handling terminal attach
INFO[0000] Received shutdown.Stop(), terminating! PID=18606
DEBU[0000] Enabling signal proxying
DEBU[0000] Mounted container "509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938" at "/mnt/podman-storage/btrfs/subvolumes/5e0f0219a6a9ea6b9515a5517ece74918ee2aa48078a1acd9625f0aca195251e"
DEBU[0000] Made network namespace at /var/run/netns/netns-df26baeb-2b07-198b-6297-4e5edf5a92c1 for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938
DEBU[0000] Created root filesystem for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 at /mnt/podman-storage/btrfs/subvolumes/5e0f0219a6a9ea6b9515a5517ece74918ee2aa48078a1acd9625f0aca195251e
[DEBUG netavark::network::validation] "Validating network namespace..."
[DEBUG netavark::commands::setup] "Setting up..."
[INFO netavark::firewall] Using iptables firewall driver
[DEBUG netavark::network::bridge] Setup network podman
[DEBUG netavark::network::bridge] Container interface name: eth0 with IP addresses [172.18.95.73/16]
[DEBUG netavark::network::bridge] Bridge name: podman0 with IP addresses [172.18.0.1/16]
[DEBUG netavark::network::core_utils] Setting sysctl value for net.ipv4.ip_forward to 1
[DEBUG netavark::network::core_utils] Setting sysctl value for /proc/sys/net/ipv6/conf/eth0/autoconf to 0
[INFO netavark::network::netlink] Adding route (dest: 0.0.0.0/0 ,gw: 172.18.0.1, metric 100)
[DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-1D8721804F16F created on table nat
[DEBUG netavark::firewall::varktables::helpers] chain NETAVARK_ISOLATION_2 created on table filter
[DEBUG netavark::firewall::varktables::helpers] chain NETAVARK_ISOLATION_3 created on table filter
[DEBUG netavark::firewall::varktables::helpers] chain NETAVARK_FORWARD exists on table filter
[DEBUG netavark::firewall::varktables::helpers] chain NETAVARK_FORWARD exists on table filter
[DEBUG netavark::firewall::varktables::helpers] rule -d 172.18.0.0/16 -j ACCEPT created on table nat and chain NETAVARK-1D8721804F16F
[DEBUG netavark::firewall::varktables::helpers] rule ! -d 224.0.0.0/4 -j MASQUERADE created on table nat and chain NETAVARK-1D8721804F16F
[DEBUG netavark::firewall::varktables::helpers] rule -s 172.18.0.0/16 -j NETAVARK-1D8721804F16F created on table nat and chain POSTROUTING
[DEBUG netavark::firewall::varktables::helpers] rule -d 172.18.0.0/16 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT created on table filter and chain NETAVARK_FORWARD
[DEBUG netavark::firewall::varktables::helpers] rule -s 172.18.0.0/16 -j ACCEPT created on table filter and chain NETAVARK_FORWARD
[DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-HOSTPORT-SETMARK exists on table nat
[DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-HOSTPORT-SETMARK exists on table nat
[DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-HOSTPORT-MASQ exists on table nat
[DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-HOSTPORT-MASQ exists on table nat
[DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-HOSTPORT-DNAT exists on table nat
[DEBUG netavark::firewall::varktables::helpers] chain NETAVARK-HOSTPORT-DNAT exists on table nat
[DEBUG netavark::firewall::varktables::helpers] rule -j MARK --set-xmark 0x2000/0x2000 exists on table nat and chain NETAVARK-HOSTPORT-SETMARK
[DEBUG netavark::firewall::varktables::helpers] rule -j MASQUERADE -m comment --comment 'netavark portfw masq mark' -m mark --mark 0x2000/0x2000 exists on table nat and chain NETAVARK-HOSTPORT-MASQ
[DEBUG netavark::firewall::varktables::helpers] rule -j NETAVARK-HOSTPORT-DNAT -m addrtype --dst-type LOCAL exists on table nat and chain PREROUTING
[DEBUG netavark::firewall::varktables::helpers] rule -j NETAVARK-HOSTPORT-DNAT -m addrtype --dst-type LOCAL exists on table nat and chain OUTPUT
[DEBUG netavark::commands::setup] {
"podman": StatusBlock {
dns_search_domains: Some(
[],
),
dns_server_ips: Some(
[],
),
interfaces: Some(
{
"eth0": NetInterface {
mac_address: "fe:84:9e:e3:96:eb",
subnets: Some(
[
NetAddress {
gateway: Some(
172.18.0.1,
),
ipnet: 172.18.95.73/16,
},
],
),
},
},
),
},
}
[DEBUG netavark::commands::setup] "Setup complete"
DEBU[0000] /etc/system-fips does not exist on host, not mounting FIPS mode subscription
DEBU[0000] Setting Cgroup path for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 to /libpod_parent/libpod-509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938
DEBU[0000] reading hooks from /usr/share/containers/oci/hooks.d
DEBU[0000] Workdir "/" resolved to host path "/mnt/podman-storage/btrfs/subvolumes/5e0f0219a6a9ea6b9515a5517ece74918ee2aa48078a1acd9625f0aca195251e"
DEBU[0000] Created OCI spec for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 at /mnt/podman-storage/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata/config.json
DEBU[0000] /usr/libexec/podman/conmon messages will be logged to syslog
DEBU[0000] running conmon: /usr/libexec/podman/conmon args="[--api-version 1 -c 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 -u 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 -r /usr/bin/crun -b /mnt/podman-storage/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata -p /var/run/podman/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata/pidfile -n zen_hertz --exit-dir /var/run/libpod/exits --full-attach -l k8s-file:/mnt/podman-storage/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata/ctr.log --log-level debug --syslog -t --conmon-pidfile /var/run/podman/btrfs-containers/509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938/userdata/conmon.pid --exit-command /usr/bin/podman --exit-command-arg --root --exit-command-arg /mnt/podman-storage --exit-command-arg --runroot --exit-command-arg /var/run/podman --exit-command-arg --log-level --exit-command-arg debug --exit-command-arg --cgroup-manager --exit-command-arg cgroupfs --exit-command-arg --tmpdir --exit-command-arg /var/run/libpod --exit-command-arg --network-config-dir --exit-command-arg --exit-command-arg --network-backend --exit-command-arg netavark --exit-command-arg --volumepath --exit-command-arg /space/podman/volumes --exit-command-arg --db-backend --exit-command-arg sqlite --exit-command-arg --transient-store=false --exit-command-arg --runtime --exit-command-arg crun --exit-command-arg --storage-driver --exit-command-arg btrfs --exit-command-arg --events-backend --exit-command-arg file --exit-command-arg --syslog --exit-command-arg container --exit-command-arg cleanup --exit-command-arg --rm --exit-command-arg 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938]"
DEBU[0000] Received: 19209
INFO[0000] Got Conmon PID as 19207
DEBU[0000] Created container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 in OCI runtime
DEBU[0000] Adding nameserver(s) from network status of '[]'
DEBU[0000] Adding search domain(s) from network status of '[]'
DEBU[0000] Attaching to container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938
DEBU[0000] Starting container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 with command [/bin/sh]
DEBU[0000] Received a resize event: {Width:165 Height:46}
DEBU[0000] Started container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938
DEBU[0000] Notify sent successfully
Tear-down:
/ # date ; exit
Wed Sep 13 09:45:46 UTC 2023
DEBU[0028] Exceeded conmon timeout waiting for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 to exit
DEBU[0028] Checking if container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 should restart
DEBU[0028] Removing container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938
DEBU[0028] Cleaning up container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938
DEBU[0028] Tearing down network namespace at /var/run/netns/netns-df26baeb-2b07-198b-6297-4e5edf5a92c1 for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938
[DEBUG netavark::commands::teardown] "Tearing down.."
[INFO netavark::firewall] Using iptables firewall driver
[INFO netavark::network::bridge] removing bridge podman0
[DEBUG netavark::commands::teardown] "Teardown complete"
DEBU[0029] Successfully cleaned up container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938
DEBU[0029] Unmounted container "509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938"
DEBU[0029] Removing all exec sessions for container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938
DEBU[0029] Container 509e41f647ed96e58846135706613adbc0d01e0a784b78d912276faa98bdc938 storage is already unmounted, skipping...
DEBU[0029] Called run.PersistentPostRunE(podman run -it --rm --log-level debug docker.io/alpine:latest)
DEBU[0029] Shutting down engines
Wed 13 Sep 10:46:07 BST 2023
Unfortunately there's no per-debug-statement timing information, but Exceeded conmon timeout waiting for container
looks suspicious.
Exceeded conmon timeout waiting for container
yes this is the line printed after 20s of waiting
https://github.com/containers/podman/blob/e8c4d7966b8bf3b2f592e7dbe1bc7d5e556f826d/libpod/container_api.go#L600-L621
Can you run podman ps -a
in another terminal while it is waiting, then we see what state the container is in.
you could experiment with overlay on a different store:
podman --storage-driver overlay --root /somewhere/else ...
and see how that performs.
Ah - you might be on to something here!
Running from tmpfs
:
$ sudo podman run -it --rm --storage-driver overlay --root /tmp/podman-alt-root docker.io/library/alpine:latest ; date
/ # date ; exit
Wed Sep 13 09:53:52 UTC 2023
Wed 13 Sep 10:53:52 BST 2023
… running from NVMe, but using the overlay driver:
$ sudo podman run -it --rm --storage-driver overlay --root /mnt/nvme/podman-alt docker.io/library/alpine:latest ; date
/ # date ; exit
Wed Sep 13 09:56:05 UTC 2023
Wed 13 Sep 10:56:05 BST 2023
So the issue does seem to be related to the btrfs
storage-driver.
… apologies, my recollection is that it had always been slow even before migrating to btrfs
(which was a change I only made a few months ago).
Also, thanks for the image-copy command - I wasn't aware that was an option!
Can you run
podman ps -a
in another terminal while it is waiting, then we see what state the container is in.
docker.io/alpine:latest /bin/sh 14 seconds ago Exited (0) 2 seconds ago
… so the state seems to have moved to Exited
even though the instance is waiting on conmon
(which only appears to happen with the btrfs
driver).
None of the core maintainers works on the BTRFS driver, so unlikely to be fixed unless community members step up.
Is it worth filing a bug against https://github.com/containers/conmon, or is that the same audience of maintainers/contributors as here - or indeed, does the above debugging information indicate that the issue is (triggered) within podman
rather than conmon
?
hard to see where is the error without being able to reproduce it.
I'd suggest you manually try running podman container cleanup
while the cleanup hangs to see if you can catch any useful error, or use something like /usr/share/bcc/tools/execsnoop
to detect what processes run when the container terminates
I've reproduced this issue on a couple of systems which are using the overlay
driver, one amd64
and one arm64
.
amd64:
$ sudo podman run -it --rm docker.io/alpine:latest ; date
Password:
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob 7264a8db6415 done
Copying config 7e01a0d0a1 done
Writing manifest to image destination
/ # date ; exit
Thu Sep 14 14:18:01 UTC 2023
Thu 14 Sep 15:18:22 BST 2023
Configuration:
host:
arch: amd64
buildahVersion: 1.31.2
cgroupControllers:
- cpuset
- cpu
- io
- memory
- hugetlb
- pids
- rdma
- misc
cgroupManager: cgroupfs
cgroupVersion: v2
conmon:
package: app-containers/conmon-2.1.8
path: /usr/bin/conmon
version: 'conmon version 2.1.8, commit: 00e08f4a9ca5420de733bf542b930ad58e1a7e7d'
cpuUtilization:
idlePercent: 96.75
systemPercent: 2.85
userPercent: 0.4
cpus: 4
databaseBackend: boltdb
distribution:
distribution: gentoo
version: "2.14"
eventLogger: file
freeLocks: 2032
hostname: apu2e
idMappings:
gidmap: null
uidmap: null
kernel: 6.1.4-gentoo
linkmode: dynamic
logDriver: k8s-file
memFree: 1112412160
memTotal: 4115214336
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: app-containers/aardvark-dns-1.7.0
path: /usr/libexec/podman/aardvark-dns
version: aardvark-dns 1.7.0
package: app-containers/netavark-1.7.0
path: /usr/libexec/podman/netavark
version: netavark 1.7.0
ociRuntime:
name: crun
package: app-containers/crun-1.9
path: /usr/bin/crun
version: |-
crun version 1.9
commit: a538ac4ea1ff319bcfe2bf81cb5c6f687e2dc9d3
rundir: /var/run/crun
spec: 1.0.0
+SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
os: linux
pasta:
executable: ""
package: ""
version: ""
remoteSocket:
path: /run/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: ""
package: ""
version: ""
swapFree: 2684350464
swapTotal: 2684350464
uptime: 5306h 12m 48.00s (Approximately 221.08 days)
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
localhost:5000:
Blocked: false
Insecure: true
Location: localhost:5000
MirrorByDigestOnly: false
Mirrors: []
Prefix: localhost:5000
PullFromMirror: ""
search:
- docker.io
- docker.pkg.github.com
- quay.io
- public.ecr.aws
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 6
paused: 0
running: 4
stopped: 2
graphDriverName: overlay
graphOptions:
overlay.ignore_chown_errors: "false"
graphRoot: /storage/containers/podman/storage
graphRootAllocated: 31526481920
graphRootUsed: 16571924480
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "true"
imageCopyTmpDir: /storage/containers/podman/tmp
imageStore:
number: 315
runRoot: /storage/containers/podman/run
transientStore: false
volumePath: /storage/containers/podman/volumes
version:
APIVersion: 4.6.2
Built: 1693466451
BuiltTime: Thu Aug 31 08:20:51 2023
GitCommit: 5db42e86862ef42c59304c38aa583732fd80f178
GoVersion: go1.20.7
Os: linux
OsArch: linux/amd64
Version: 4.6.2
Concurrent containers:
$ sudo podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e11efc411a22 localhost/service.sys-apps.irqbalance:1.9.2 --foreground --pi... 13 days ago Up 13 days openrc-irqbalance-1.9.2
a4b0d0ddda5b localhost/service.app-admin.metalog:20220214 -p /var/run/metal... 13 days ago Up 13 days openrc-metalog-20220214
929329c36baf localhost/service.net-misc.openntpd:6.2_p3-r2 -d -f /etc/opennt... 13 days ago Up 13 days openrc-openntpd-6.2_p3-r2
8fab39f502fa localhost/service.net-proxy.privoxy:3.0.33 --no-daemon --pid... 13 days ago Up 13 days openrc-privoxy-3.0.33
arm64:
$ sudo podman run -it --rm docker.io/alpine:latest ; date
Password:
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob 9fda8d8052c6 done
Copying config f6648c04cd done
Writing manifest to image destination
/ # date ; exit
Thu Sep 14 14:18:51 UTC 2023
Thu 14 Sep 15:19:11 BST 2023
Configuration:
host:
arch: arm64
buildahVersion: 1.31.2
cgroupControllers:
- cpuset
- cpu
- io
- memory
- pids
cgroupManager: cgroupfs
cgroupVersion: v2
conmon:
package: app-containers/conmon-2.1.7
path: /usr/libexec/podman/conmon
version: 'conmon version 2.1.7, commit: f633919178f6c8ee4fb41b848a056ec33f8d707d'
cpuUtilization:
idlePercent: 95.94
systemPercent: 3.32
userPercent: 0.74
cpus: 4
databaseBackend: sqlite
distribution:
distribution: gentoo
version: "2.14"
eventLogger: file
freeLocks: 2044
hostname: turnpike
idMappings:
gidmap: null
uidmap: null
kernel: 6.1.21-v8+
linkmode: dynamic
logDriver: k8s-file
memFree: 2312245248
memTotal: 8190758912
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: app-containers/aardvark-dns-1.7.0
path: /usr/libexec/podman/aardvark-dns
version: aardvark-dns 1.7.0
package: app-containers/netavark-1.7.0
path: /usr/libexec/podman/netavark
version: netavark 1.7.0
ociRuntime:
name: crun
package: app-containers/crun-1.8.5
path: /usr/bin/crun
version: |-
crun version 1.8.5
commit: b6f80f766c9a89eb7b1440c0a70ab287434b17ed
rundir: /var/run/crun
spec: 1.0.0
+SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
os: linux
pasta:
executable: ""
package: ""
version: ""
remoteSocket:
path: /run/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: ""
package: ""
version: ""
swapFree: 8589406208
swapTotal: 8589930496
uptime: 92h 29m 14.00s (Approximately 3.83 days)
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
localhost:5000:
Blocked: false
Insecure: true
Location: localhost:5000
MirrorByDigestOnly: false
Mirrors: []
Prefix: localhost:5000
PullFromMirror: ""
search:
- docker.io
- docker.pkg.github.com
- quay.io
- public.ecr.aws
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 4
paused: 0
running: 4
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev
graphRoot: /storage/containers/podman/storage
graphRootAllocated: 67049664512
graphRootUsed: 17296240640
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageCopyTmpDir: /storage/containers/podman/tmp
imageStore:
number: 238
runRoot: /storage/containers/podman/run.tmp
transientStore: false
volumePath: /storage/containers/podman/volumes
version:
APIVersion: 4.6.1
Built: 1692036008
BuiltTime: Mon Aug 14 19:00:08 2023
GitCommit: f3069b3ff48e30373c33b3f5976f15abf8cfee20
GoVersion: go1.20.5
Os: linux
OsArch: linux/arm64
Version: 4.6.1
Concurrent containers:
$ sudo podman ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
af3fa1acec15 localhost/service.app-admin.syslog-ng:4.2.0 --cfgfile /etc/sy... 2 days ago Up 2 days openrc-syslog-ng-4.2.0
93c3ee4b92c5 localhost/service.net-dns.bind:9.16.41 -u named -f 2 days ago Up 2 days openrc-bind-9.16.41
004d89162a52 localhost/service.net-misc.openntpd:6.2_p3-r2 -d -f /etc/opennt... 2 days ago Up 2 days openrc-openntpd-6.2_p3-r2
8c8cc3dce354 localhost/service.net-misc.dhcp:4.4.3_p1-r4 -cf /etc/dhcp/dhc... 2 days ago Up 2 days openrc-dhcp-4.4.3_p1-r4
… which is interesting, as on the original system the problem went away when temporarily changing to the overlay
driver - which suggests that the problem is not with the btrfs
(or overlay
) driver, but nevertheless using a pristine storage directory makes it go away.
Also, both bolted
and sqlite
database backends are affected.
Given this, could there be any link to total image size within podman storage and this delay?
Original system:
sudo podman info --debug | grep -F podman | awk '{ print $2 }' | xargs -rn 1 sudo du -hxs
1.5G /mnt/podman-storage
113M /space/podman/volumes
New amd64 system:
$ sudo podman info --debug | grep -F '/storage/' | awk '{ print $2 }' | xargs -rn 1 sudo du -hxs
16G /storage/containers/podman/storage
8.0K /storage/containers/podman/tmp
120K /storage/containers/podman/run
42M /storage/containers/podman/volumes
arm64 system:
16G /storage/containers/podman/storage
12K /storage/containers/podman/tmp
72K /storage/containers/podman/run.tmp
4.0K /storage/containers/podman/volumes
I am absolutely incapable of reproducing. Can you run /usr/share/bcc/tools/execsnoop
in parallel and share which processes are being run? Does it also happen without --rm
? If so, can you after that share podman ps --all
?
--rm
the issue is still there:$ sudo podman run --net none docker.io/library/alpine:latest 'date'; date
Mon Sep 18 14:14:54 UTC 2023
Mon 18 Sep 15:15:15 BST 2023
Output of podman ps -a
:
$ sudo podman ps -a | grep 'alpine.*date'
61eef5c90f21 docker.io/library/alpine:latest date 38 seconds ago Exited (0) 39 seconds ago upbeat_swanson
There's no significant delay removing this Exited
container:
$ time sudo podman rm upbeat_swanson
upbeat_swanson
real 0m0.216s
user 0m0.002s
sys 0m0.003s
podman
exits immediately (e.g. the problem is not seen):$ sudo podman run --net none docker.io/library/alpine:latest 'exit'
Error: crun: executable file `exit` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found
… although in this case, the container persists with state Created
. If I then remove this Created
container, the result is also almost instant:
$ time sudo podman rm 1364702f0f2a
1364702f0f2a
real 0m0.063s
user 0m0.003s
sys 0m0.002s
I'm not familiar with 'execsnoop', but I'll take a look now...
Thanks for sharing!
I'm not familiar with 'execsnoop', but I'll take a look now...
It's part of bcc-tools and a veeeeery helpful script.
I feel I'm making progress - the bcc
utilities require LLVM/clang which is quite chunky, so I thought I'd try running it within a container.
With a default/unconfigured installation, as expected, podman
within a container exists instantly.
However, if I do nothing more than import my configuration, change the driver to overlay
, and create the referenced directories, then the problem again returns.
… so the problem only appears with some non-default setting in /etc/containers/
which I seem to have set.
I'll keep digging...
… it's something in containers.conf
, which I have as follows:
[containers]
init_path = "/usr/libexec/podman/catatonit"
[network]
network_backend = "netavark"
default_subnet="172.18.0.0/16"
[engine]
cgroup_manager = "cgroupfs"
conmon_path = [
"/usr/libexec/podman/conmon",
"/usr/bin/conmon"
]
database_backend="sqlite"
env = ["TMPDIR=/space/podman/tmp"]
events_logger = "file"
events_logfile_path = "/var/log/podman/events.log"
events_logfile_max_size = 10485760
infra_command = "/usr/bin/pause"
infra_image = "gcr.io/google_containers/pause-amd64:3.2"
remote_uri= "unix://var/run/podman/podman.sock"
static_dir = "/space/podman/static"
tmp_dir = "/var/run/libpod"
volume_path = "/space/podman/volumes"
runtime = "crun"
runtime_supports_json = ["crun", "runc"]
runtime_supports_nocgroups = ["crun"]
runtime_supports_kvm = []
image_copy_tmp_dir = "/space/podman/tmp"
[engine.runtimes]
runc = [
"/usr/bin/runc"
]
crun = [
"/usr/bin/crun"
]
kata = []
Urgh - I've found the specific issue:
static_dir
is set in /etc/containers/containers.conf
, then there's a 20-second delay on container exit which goes away once the setting is removed.… which I suspect means that some component has hard-coded the default (/var/lib/containers/storage/libpod
?) and is breaking if the expected content isn't there?
Rough reproducer:
In a fresh container, install podman
and then run:
[ -s /etc/containers/policy.json.example ] && [ ! -s /etc/containers/policy.json ] && cp /etc/containers/policy.json.example /etc/containers/policy.json
podman stop -a ;
podman rm -v -a ;
podman image rm -f -a ;
podman system reset -f ;
[ -s /etc/containers/containers.conf ] && rm -r $( sed 's/#.*$//' /etc/containers/containers.conf | grep -o '"/[^"]\+"' | grep -v -e 'bin/' -e 'libexec/' -e '\.log"' ) ;
podman system reset -f ;
[ -s /etc/containers/containers.conf ] && mkdir -p $( sed 's/#.*$//' /etc/containers/containers.conf | grep -o '"/[^"]\+"' | grep -v -e 'bin/' -e 'libexec/' -e '\.log"' ) ;
podman image pull docker.io/library/alpine:latest ;
podman run --net none docker.io/library/alpine:latest 'date' ; date
… then create /etc/containers/containers.conf
containing:
[engine]
static_dir = "/tmp/podman/static"
… and re-run the above: the path used for static_dir
doesn't matter: if it's missing, containers exit instantly - if it's present, there's a 20 second timeout.
(This is with crun
, I'm not sure whether this is related. I've not tried with runc
installed)
Furthermore, the specified static_dir
contains only a bolt_state.db
which hasn't changed since May.
Is appears that the actual bolt_state.db
is now in (graph root)/libpod
regardless of the static_dir
setting.
Is static_dir
now vestigial?
Interestingly, a system reset
may be needed to fix this, as commenting static_dir
on the host system (with running containers) and trying to run date
in Alpine still gives a 20s delay.
I also have:
database_backend="sqlite"
… in /etc/containers/containers.conf
, and yet I see /var/lib/containers/cache
(is this a hard-coded path which can't be configured?) contains:
# ls -l cache/
total 74096
-rw------- 1 root root 92598272 Sep 18 10:41 blob-info-cache-v1.boltdb
-rw-r--r-- 1 root root 32768 Sep 16 15:27 blob-info-cache-v1.sqlite
… why do I have a boltdb file which is more up to date (and much larger) than the sqlite one - I thought sqlite was a complete replacement?
Update: Removing static_dir
from containers.conf
isn't enough to resolve the issue on a non-fresh system: even after a reboot, the 20s delay persists.
(Although running with --transient-store
also prevents the delay from occurring)
- If
static_dir
is set in/etc/containers/containers.conf
, then there's a 20-second delay on container exit which goes away once the setting is removed.
I can reproduce as well with this setting. Great work tracking this down, @srcshelton! As to why this happens needs some investigation.
… why do I have a boltdb file which is more up to date (and much larger) than the sqlite one - I thought sqlite was a complete replacement?
I assume you changed the db_backend, so the previous one will continue lingering around.
I made the observation that the cleanup process does not see (or read) the containers.conf file correctly. I configured mine in the $HOME directory and noticed that podman container cleanup
doesn't pick up the configured value. I'll continue digging.
I can raise a new bug report for this, but I suspect it might be the same root-cause:
Even with database_backend="sqlite"
in /etc/containers/containers.conf
, I'm still seeing /var/lib/containers/cache/blob-info-cache-v1.boltdb
created and used, and /var/lib/containers/cache/blob-info-cache-v1.sqlite
unchanged in the past 3 days, despite several reboots and container restarts.
Yes, there is something really strange going on. container cleanup
doesn't seem to pick up the correct values from containers.conf. I'll be buried in meetings today but will take a look.
Cc: @mheon
Even with database_backend="sqlite" in /etc/containers/containers.conf, I'm still seeing /var/lib/containers/cache/blob-info-cache-v1.boltdb
This is expected. The blob-info-cache is something the image library was using.
Issue Description
Whilst podman is able to start containers from even very large images fast enough that it feels interactively instant, shutting them down again takes what feels like a long time.
I wonder whether this is because I'm running the
btrfs
storage-driver, because I'm not running on a system managed bysystemd
(and so perhaps some call-back is failing?), or whether some default timeout is being adhered to even if the container processes have exited near the start of the timeout period?Steps to reproduce the issue
Describe the results you received
There is a 20s pause between the container process exiting and the container itself having been fully torn-down.
N.B. This is not a new problem - this system has been upgraded from
podman
2.x with the VFS storage-driver, but has always been slow to tear-down containers.Describe the results you expected
In other scenarios, if the process within the container exits then the container is torn-down effectively instantly.
podman run
has a--stop-timeout
which apparently defaults to 10 seconds, but the stop delay is actually twice that - and in this case the container process has itself exited, so I'd imagine that any timeout dependent on processes within the container (as opposed to related host processes such ascrun
/runc
) should not be adhered to?podman info output
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
Yes
Additional environment details
Host is running
openrc
, notsystemd
.Additional information
The size of the image (within reason) does not seem to have a significant effect on the tear-down time - for a 1.08GB image, the result is:
… so the delay appears to be a consistent 20s.