Closed martinpitt closed 10 months ago
A friendly reminder that this issue had no activity for 30 days.
bump
A friendly reminder that this issue had no activity for 30 days.
I think @mheon looked at this recently? Did you came to any conclusion?
The fact that we trigger the healthcheck immediately seems wrong to me and I would expect to wait for at least the interval an do know how we can fix that. Now if I read docs about --health-start-period for docker that seems to suggest that the healthcheck is not actually delayed but rather that only all failures before this time are ignored.
I confirmed that our handling of start period is correct (we do ignore failures within the accepted number of seconds), but I am still not 100% sure that we're matching Docker here - I think they may be displaying that the HC is starting, even if it's successful? Need to double-check the code over there.
Hi @martinpitt, I looked into this and the reason you are only seeing an unhealthy
status update after 30 seconds is because if the user doesn't set --health-interval
, we default it to 30 seconds. So what is happening in your case is that, the healthcheck command runs as soon as the container comes up, it sees that is unhealthy but it is still in the start-period
time so it doesn't update the status because the start-period time is the grace period for the container to fully bootstrap. But since you haven't specifies the --health-interval
, the next time the healthcheck command is run is after 30 seconds because that is the default value.
I would recommend setting the --health-interval
value to something like 2 seconds so that the healtcheck command is fired every 2 seconds for your use case. I tested it out and it works as expected:
podman run -d --rm --health-cmd=false --health-start-period=5s --health-retries=1 --health-interval=2s alpine sleep 1000
➜ podman git:(healthcheck) ✗ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bbcfad6b4fc4 docker.io/library/alpine:latest sleep 1000 4 seconds ago Up 4 seconds (starting) relaxed_boyd
➜ podman git:(healthcheck) ✗ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bbcfad6b4fc4 docker.io/library/alpine:latest sleep 1000 5 seconds ago Up 5 seconds (starting) relaxed_boyd
➜ podman git:(healthcheck) ✗ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bbcfad6b4fc4 docker.io/library/alpine:latest sleep 1000 6 seconds ago Up 6 seconds (unhealthy) relaxed_boyd
If the healtcheck command returns healthy
, then we update the health status to healthy immediately ignoring the start-period time so that service managers don't have to wait the full grace period given to a container to bootstrap. That is why you are seeing a healthy
state almost immediately for a health check that succeeds. This matches docker behavior also.
This probably needs to be clarified in the docs, I will open a PR to update the docs.
@umohnani8 I'm aware of the 30s default of --health-interval
. That seems to work fine (see the part that covers an unhealthy run). The bug report is about --health-start-period
: It doesn't have an observable effect if the health-cmd succeeds, both podman ps
and podman events
immediately claim "healthy" -- it should either be "starting", or preferably, not run the health-check command at all until the start-period has passed. For a failing health check, it still runs the command immediately (instead of postponing it), but at least it results in "starting" instead of an immediate "unhealthy".
So it's inconsistent, it's not clear what it actually does, and IMHO it's also badly named (of course this depends on what it's actually supposed to do, but I can't see how it is a period).
@martinpitt As mentioned above within that period all healthcheck failures are ignored so it does seems to work and this is what docker does as well. That does not replace the interval in which healtcheckes are executed. Given your interval is still 30 seconds the second healthcheck is executed after 30s and only then it can turn into the failed state.
We should expand the docs on that option but I think the code works as intended.
Issue Description
This was originally part of #19237, but split out by @mheon's request.
When running a container with a successful health check:
and monitoring it with
podman events
andwatch -n1 podman ps -a
, it seems to run the health check immediately:(note the time stamps), and
podman ps
also shows "Up 1 seconds (healthy)".With a failing health check (
--health-cmd=false
) it looks a little different. It also immediately runs the health check command (exec_died events), but claims "starting" after the first run:and only 30s later it moves to "unhealthy":
It looks exactly the same without
--health-start-period=5s
, so it seems that option has no observable effect.Steps to reproduce the issue
see description
Describe the results you received
Health check command runs immediately after container starts
Describe the results you expected
Health check command runs after the given time span with
--health-start-period
.Also, this option seems to be named unfortunately. "period" suggests something repeating. But it should only have an affect once, after container startup. The period is
--health-startup-interval
.The manpage says
It does not directly say that it will delay the first health check, but what else would it do?
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
Happens on bare metal and in VM, it also happens as root and user.
Additional information
No response