Closed shieldwed closed 1 year ago
The first step is to always test is with the latest podman and kernel. Reading the links you gave this is a kernel bug which was fixed a while ago so you to update your kernel.
RHEL always has "older" kernels they just backport many bug fixes so it is fixed there. So you have to update your kernel or ask ubuntu to backport the required fixes
Issue Description
Restarting Podman containers or running containers with health checks leads to raising memory usage until the system gets stuck (after 4.4 hours, SystemD started to terminate services due to watchdog timeouts). On a regular system, with about 20 containers running it happens over about 2 weeks, but it can be condensed down to hours (see steps below).
Collecting metrics on all processes, I noticed that all of them were using more or less the same amount of memory, so user space processes didn't seem to claim more memory over time. However, the
/proc/meminfo
contains the metricPercpu
which was increasing over time just about the same amount of available memory decreased.Searching the web I found https://access.redhat.com/solutions/6740861 (undisclosed information) and https://bugzilla.redhat.com/show_bug.cgi?id=2004037 (older kernel) which didn't help me particularly.
Steps to reproduce the issue
Steps to reproduce the issue:
Describe the results you received
Before starting, any podman containers:
Once all containers are started:
Now watch the `Percpu` metric rising:
```bash # while :; do grep Percpu /proc/meminfo; sleep 180; done Percpu: 736320 kB Percpu: 815040 kB Percpu: 840960 kB Percpu: 861120 kB Percpu: 901440 kB Percpu: 910080 kB Percpu: 989760 kB Percpu: 1076160 kB Percpu: 1086720 kB Percpu: 1120320 kB Percpu: 1141440 kB Percpu: 1228800 kB Percpu: 1214400 kB Percpu: 1335360 kB Percpu: 1289280 kB Percpu: 1272960 kB Percpu: 1332480 kB Percpu: 1394880 kB Percpu: 1410240 kB Percpu: 1416000 kB Percpu: 1458240 kB Percpu: 1487040 kB Percpu: 1510080 kB Percpu: 1521600 kB Percpu: 1534080 kB Percpu: 1536000 kB Percpu: 1605120 kB Percpu: 1618560 kB Percpu: 1659840 kB Percpu: 1657920 kB Percpu: 1700160 kB Percpu: 1746240 kB Percpu: 1776960 kB Percpu: 1794240 kB Percpu: 1785600 kB Percpu: 1794240 kB Percpu: 1819200 kB Percpu: 1830720 kB Percpu: 1870080 kB Percpu: 1923840 kB Percpu: 1933440 kB Percpu: 1883520 kB Percpu: 1882560 kB Percpu: 1939200 kB Percpu: 1896000 kB Percpu: 1959360 kB Percpu: 2005440 kB Percpu: 1954560 kB Percpu: 1954560 kB Percpu: 2066880 kB Percpu: 2168640 kB Percpu: 2145600 kB Percpu: 2209920 kB Percpu: 2233920 kB Percpu: 2272320 kB Percpu: 2233920 kB Percpu: 2272320 kB Percpu: 2309760 kB Percpu: 2308800 kB Percpu: 2339520 kB Percpu: 2385600 kB Percpu: 2402880 kB Percpu: 2402880 kB Percpu: 2402880 kB Percpu: 2441280 kB Percpu: 2462400 kB Percpu: 2462400 kB Percpu: 2462400 kB Percpu: 2462400 kB Percpu: 2462400 kB Percpu: 2462400 kB Percpu: 2446080 kB Percpu: 2445120 kB Percpu: 2441280 kB Percpu: 2441280 kB Percpu: 2441280 kB Percpu: 2439360 kB Percpu: 2439360 kB Percpu: 2439360 kB Percpu: 2439360 kB Percpu: 2437440 kB Percpu: 2437440 kB Percpu: 2437440 kB Percpu: 2437440 kB Percpu: 2437440 kB Percpu: 2437440 kB Percpu: 2437440 kB Percpu: 2437440 kB ```Describe the results you expected
Memory usage should not raise over time merely by employing health checks or restarting containers.
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
No
Additional environment details
Additional information
Kernel stack trace while CPU was working with low free memory:
``` [15646.322414] sysrq: Show backtrace of all active CPUs [15646.323161] NMI backtrace for cpu 1 [15646.323165] CPU: 1 PID: 3383669 Comm: bash Not tainted 5.15.0-72-generic #79-Ubuntu [15646.323168] Hardware name: Nutanix AHV, BIOS 0.0.0 02/06/2015 [15646.323173] Call Trace: [15646.323175]