Ortus-Solutions / docker-commandbox

Official CommandBox Docker Image for ColdFusion/CFML/Java applications
62 stars 41 forks source link

Logging on healthcheck fail #76

Closed bdw429s closed 7 months ago

bdw429s commented 11 months ago

Perhaps this isn't possible, but I'd like to have some sort of logging that happens any time the healthcheck URL fails in our containers. Occasionally I'll run across a user with the issue of their container just shutting down and they don't know why. Often times this will be because their healthcheck was failing, but they didn't realize it. From what I've seen, Docker doesn't log much itself outside of just

Killed

And, of course, that could mean several things since it really just means a sigkill was sent to the container. So it would be great to at least know if the health check was failing at the same time.

I'd recommend seeing if we can move the helthcheck command to a separate shell script were we can capture

and then log them prior to returning that failing exit code. Then the container logs can have something like

healtcheck URL: http://localhost URL FAILED, timing out after 30 seconds. 

or

healtcheck URL: http://localhost/DB_Check.cfm URL FAILED, returning [500 Internal Server Error] after 12 seconds.
Error Response: Database connection failed or whatever HTML came back, possibly truncated to a few thousand chars. 
jclausen commented 7 months ago

I'm circling back to this today before I bump a release. The default healthcheck does capture the output of the curl --fail ${HEALTHCHECK_URI} but with a Killed status of a container, that could also be an OOM kill - usually when the container exceeds either the total host resources or exceeds any resource limits specified in a stack file or other config.

Below is an example of the formatted docker inspect on a failed container "Health" section. Note that this container started failing with timeouts to the URL on port 8080. As such, you only see the cURL progress output. On a successful healthcheck you would see the actual output and on a 500 error, you would see the error HTML output.

While it is possible implement some sort of custom healthcheck.sh that looks at the status code and the pattern response, the downside of that is, on a timeout of the healthcheck ( default 30s ), you would get no output at all, because the custom script to parse would swallow any cURL output. As such, I think I'm going hold off on trying that for now.

Going forward, if you need to debug a failed container, try using JQ to pull only the health section back:

docker inspect --format "{{json .State }}" <container name> | jq

That should give you information on exactly where the container failed. I'm also going to update the docs with this information. Screenshot 2024-04-03 at 2 20 01 PM