Open zbintliff opened 4 years ago
Thanks for the request! So if I understand correctly, the ask is specifically for tasks running as part of a service. You would like the ability to manually set an individual task in a service to “draining” state, which will cleanly drain connections and register from the load balancer target group, and then initiate task shutdown; and also ensure that a replacement task is started up in accordance with the minimumHealthyPercent and maximumPercent parameters. Is that correct?
Exactly!
While we would love to have application solve this at the health check level sometimes the application is "healthy" but spending half the CPU cycles in a garbage collection churn. This request has popped up internally frequently lately. We have StopTask on one hand that is "harsh" action that definitely results in increased 5xx errors, and other the other hand redeploying 100+ tasks in a service because of one bad task is expensive.
While this feature request is still in the open state, is there any way to mitigate the issue for production operations? Thanks!
Still hanging out for this as well! Would appreciate any good workarounds.
Any news? AWS ECS is totally painful 😖 without such simple feature
@zbintliff Is this still necessary given https://github.com/aws/containers-roadmap/issues/708 having been resolved? It's not clear to me if there's any distinction between the two requests.
Tell us about your request Currently when you call
aws ecs stop-task
my understanding is that ECS tells the Agent to immediately send SIGKILL to the container. Ideally, we would like to mark a task to be killed and for ECS to act similar to when a container instance drains. Those steps are:Since this functionality is used for instance draining I hope it is something I have overlooked or will be easy to adopt.
Which service(s) is this request for? Fargate, ECS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? We have pretty large services (200+ tasks) that have local cache and many connections to downstream resources. Sometimes an app gets in a bad JVM state where GCs are happening more frequent than usual and we want to safely mark one task for termination while staying above our
deploymentConfiguration
healthy percent. Currently, the only way to do so is either drain the entire container instance or--force-new-deployment
. A whole new deployment is an "expensive" process and if we issue aStopTask
we see an increase in 5xx errors because no connections are drained.Are you currently working around this issue? Right now we are doing
aws ecs update-service --force-new-deployment
but as I said its "expensive" in time and resources.Please let me know if you have any questions!