Closed lespea closed 5 months ago
@mheon PTAL
To be certain: did this work with Podman 5.0?
I've used a setup similar to this for a while, yes. I'm not 100% sure when it started occurring but I'm pretty sure it was right after 5.1.0
. When I get home from work I can try rolling back to the previous version to double check that's when the errors started.
The only relevant change I remember going into 5.1 is https://github.com/containers/podman/pull/22589/commits/4fd84190b8858bb3b20b994e76954e75ba31cf9c
So maybe worth to try to revert this and test then
@lespea Does this reproduce outside of Quadlet? Something like podman run --health-cmd=... --health-internal=...
using the same values as your previous script?
I'm hopeful it doesn't because we have CI that should catch such things
This reproduces podman run --name c1 --health-cmd true --health-interval 15s --health-start-period 30s --health-startup-cmd true --health-startup-interval 5s -d quay.io/libpod/testimage:20240123 sleep inf
Our CI doesn't check for leaked transient units, the healthchecks are running fine it is just the cleanup which is failing to remove the timer
Reverting your change makes it work again.
Problems is this code https://github.com/containers/podman/blob/c510959826cdc55e6a75c40b104a9d1aa28e3632/libpod/healthcheck.go#L282-L297
Your new code only uses one field for the unit name and createTimer overwrites the startup hc with the real hc name so removeTransientFiles then removes the real hc timer and thus leaks the startup hc timer.
Ugh sorry this has been an absolute insane week. Really appreciate the fast fix/release and I can confirm that v5.1.1
works as expected!
Issue Description
With the latest update of podman (
v5.1.0
) I noticed that in my quadlet definitions theHealthInterval
is not being followed but instead theHealthStartupInterval
is. Moreover the transient.timer
files are being left behind whenever the service is stop/restarted causing many error logs to fill be generated since the container is no longer running but the healthcmd continues to be retried (in my case every few seconds for every container).Quadlet def:
Transient logs persisting:
Example of a transient service/timer
Steps to reproduce the issue
Steps to reproduce the issue
HealthStartupCmd
is run,sleep 5
2s
theHealthCmd
is being run,sleep 2
journalctl
every 2 seconds there is an error log for the startup timer/service since those containers no longer exist/var/run/systemd/transient/
to see the old timers/servicesOnUnitInactiveSec=2s
is in the timers which is the interval for the startup health check not the normal oneDescribe the results you received
Timers removed on service reset/stop
Describe the results you expected
Initial health cmd runs it's cmd/interval then once health the normal cmd runs its cmds/interval. Also the checks should be removed whenever the container is restarted/shutdown.
podman info output
Podman in a container
No
Privileged Or Rootless
Privileged
Upstream Latest Release
Yes
Additional environment details
No response
Additional information
No response