aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.17k stars 313 forks source link

[FARGATE] [request]: Print Container Healthchecks for Fargate #1114

Open dsalamancaMS opened 3 years ago

dsalamancaMS commented 3 years ago

Community Note

Tell us about your request What do you want us to build?

A way to obtain the output of the container healtchecks on Faragte

Which service(s) is this request for? Fargate

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

There is currently no way to see the output of failing container healthchecks as docker inspect is impossible in Fargate Launch Type and Docker does not redirect healtchchecks into stdout nor stderr

Are you currently working around this issue?

Additional context Anything else we should know?

Attachments If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

anagarjunr commented 3 years ago

Looking for this feature.

Docker healthcheck seems to be working on local system, but when the same image is run on fargate, it is failing and not sure if there any way to know what is going wrong

hostmaster commented 3 years ago

It becomes a severe issue when a service has many tasks. There is no way to figure out which task exactly fails a health check.

drewagentsync commented 1 year ago

I had this same request. As a workaround to push the results of the container healthcheck to my container logs I use this container health check command.

"curl -sf "http://127.0.0.1:8080/actuator/health/" >> /proc/1/fd/1"

The redirection sends the output of the curl to the stdout of the first process in the container so if your service doesn't run on pid 1 you'll need to target the correct pid. I have tested this with spring boot and it works.

I use this command with the --write-out switch to create a nice readable log instead of the normal -f output that isn't great...

--write-out '%{onerror}{"HealthCheckError": {"url": "%{url}", "total_time": "%{time_total}", "curl_exit_code": "%{exitcode}", "error_msg": "%{errormsg}"}}\n'

jmeickle commented 1 year ago

In case anyone else runs across this very surprisingly still-not-fixed issue, here's what worked for me in Terraform to get healthcheck stdout/stderr, be portable to non-bash, and not cause health check recreating:

  healthcheck = {
    command = ["CMD-SHELL", "/probe.sh >> /proc/1/fd/1 2>&1"]
    startPeriod = 0
    interval = 5
    timeout = 2
    retries = 5
  }
justas200 commented 1 year ago

In case anyone else runs across this very surprisingly still-not-fixed issue, here's what worked for me in Terraform to get healthcheck stdout/stderr, be portable to non-bash, and not cause health check recreating:

  healthcheck = {
    command = ["CMD-SHELL", "/probe.sh >> /proc/1/fd/1 2>&1"]
    startPeriod = 0
    interval = 5
    timeout = 2
    retries = 5
  }

This approach works, however it appears that the health checks response is attached to the log being pushed in the main process. Meaning that If I log something in my main process only once every 1 hour, my health checks will all be sent as a single log once an hour :/ Is there any workaround for this?

ArthurYidi commented 1 year ago

It might be tricky writing to /proc/1/fd/1, the container's proc 1 user is not always root or have the right permissions leading to Permission denied.

A more robust solution might be to write a probe script that calls CloudWatch directly. Notice it requires in this case aws cli, curl, and jq to be installed in the container. Also the task role IAM needs to have access to put log events.

probe.sh

  message="\"[healthcheck] message here\""

  task_id=$(
    curl -s "$ECS_CONTAINER_METADATA_URI_V4/task" \
    | jq -r ".TaskARN" \
    | cut -d "/" -f 3
  )

  now=$(date +%s%3N)

  aws logs put-log-events \
    --log-group-name "group_name" \
    --log-stream-name "prefix/container_name/$task_id" \
    --log-events timestamp="$now",message="$message"

task definition


healthCheck = {
  command  = ["CMD-SHELL", "probe.sh"]
  ...
}
andrewmslack commented 1 year ago

The redirection sends the output .. to the stdout of the first process in the container so if your service doesn't run on pid 1 you'll need to target the correct pid. I have tested this with spring boot and it works.

qq .. how do you find out what the pid is? Im in a similar situation not seeing the redirected output and wondering if the pid mismatch is the cause. Thx.

drewagentsync commented 1 year ago

Another problem with this approach we discovered is we don't get a single log message for the healthchecks, rather it's prepended on the next log message as mentioned by @justas200. When you have JSON logging in your service this breaks the parser in your log monitoring tool (e.g. datadog) and the log message ends up run together.

We've given up on this approach and are working on prioritizing exploring different approaches. 👎 It's OK in a pinch but the developer experience isn't the best.

Sure would be nice if AWS would give us a way to pass log messages from the healthcheck script/command to the log driver directly.

@ArthurYidi 's suggestion to send the container healthchecks to your chosen log tool is probably the approach we'll explore first.

sergio-toro commented 7 months ago

The healthcheck logs would save a lot of debugging time...

jumpinjan commented 4 months ago

+1

yves-vogl commented 2 weeks ago

+1 - stumbled upon this while trying to fix AWS Otel Container in ECS.