Closed bdw429s closed 7 months ago
I'm circling back to this today before I bump a release. The default healthcheck does capture the output of the curl --fail ${HEALTHCHECK_URI}
but with a Killed
status of a container, that could also be an OOM kill - usually when the container exceeds either the total host resources or exceeds any resource limits specified in a stack file or other config.
Below is an example of the formatted docker inspect
on a failed container "Health" section. Note that this container started failing with timeouts to the URL on port 8080. As such, you only see the cURL progress output. On a successful healthcheck you would see the actual output and on a 500 error, you would see the error HTML output.
While it is possible implement some sort of custom healthcheck.sh
that looks at the status code and the pattern response, the downside of that is, on a timeout of the healthcheck ( default 30s ), you would get no output at all, because the custom script to parse would swallow any cURL output. As such, I think I'm going hold off on trying that for now.
Going forward, if you need to debug a failed container, try using JQ to pull only the health section back:
docker inspect --format "{{json .State }}" <container name> | jq
That should give you information on exactly where the container failed. I'm also going to update the docs with this information.
Perhaps this isn't possible, but I'd like to have some sort of logging that happens any time the healthcheck URL fails in our containers. Occasionally I'll run across a user with the issue of their container just shutting down and they don't know why. Often times this will be because their healthcheck was failing, but they didn't realize it. From what I've seen, Docker doesn't log much itself outside of just
And, of course, that could mean several things since it really just means a sigkill was sent to the container. So it would be great to at least know if the health check was failing at the same time.
I'd recommend seeing if we can move the helthcheck command to a separate shell script were we can capture
and then log them prior to returning that failing exit code. Then the container logs can have something like
or