Open muzfuz opened 2 months ago
Thanks for this request. We’d like to check into the behavior you saw more thoroughly. Could you share your support case ID so we can look at your specific setup?
@kshivaz thank you for looking at this. The case ID is 171276418200173.
Thanks @muzfuz.
This issue unfortunately defeats the purpose of using service connect.
Community Note
Tell us about your request Service Connect does not support application health checks. This means it attempts to route traffic to containers before they're ready.
We would like Service Connect to have configurable health checks similar to ALBs, or to respect the Docker healthchecks which are configured in the task definition.
Which service(s) is this request for? Fargate - specifically Service Connect options.
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? We run several "big" services which have a long startup time (10 to 60 seconds). These services communicate privately using Service Connect.
We noticed that we were getting served 503s during deploys or container restarts.
After some back and forth with AWS Support we were able to establish the following sequence of events:
I received the following guidance on this from AWS Support:
From our POV we would like one of two things to be true here.
The fact that it is currently simply routing traffic to a task as soon as the Envoy sidecar becomes healthy means we need to do some pretty aggressive retries in the client applications, which works to paper over the cracks but can still lead to failure.
Are you currently working around this issue? Yes. A combination of aggressive retries and long Docker health checks has proven effective.
We received the following guidance from AWS Support:
This solution "works" but is merely a sticking plaster - it can still lead to failed requests and needlessly extends deploy / restart times.