Closed matthewcummings closed 4 years ago
We're doing the same thing, with a period Lambda that combines a strategy similar to the garbe.io
blog post with detection of tasks that are pending. We've continued to fine-tune the logic to try to strike a good balance between availability and cost, but it would be very convenient if ECS provided the functionality or at least published metrics to allow scaling the cluster out and especially in on actual service/task capacity.
Actually, it would be great if the cluster supported an "n + 1" configuration, always keeping at least one instance running for new tasks to be placed when no other instances have enough resources.
@matthewcummings I would like to extend this with requiring a stand-by per-AZ that the cluster is active in. The current behavior of ECS scheduling is quite dangerous in my mind in the case where a single node is available with plenty of space. Even with a placement strategy of spread across AZ it will put ALL the tasks on the single instance with available space.
I just finished implementing this available container count scaling for our ECS clusters and would be happy to chat with someone from AWS if they've got questions. I was just now working on a public repo + blog post with my implementation.
UPDATE: Since AWS is working on a solution for this I'll probably just abandon the blog post.. Here's some brief notes I had taken on the solution I've implemented: https://gist.github.com/jamiegs/296943b1b6ab4bdcd2a9d28e54bc3de0
It's good to see that this topic becomes awareness. Actually, I was thinking to change the metric, I described in my blog post, so when the value increase, also the cluster size should increase (like a ContainerBufferFillRate). This would help to use target tracking and makes the configuration easier.
We currently scale out in and on reservation. We are starting to run into scenarios where very large tasks (16gb of M) are no longer placed after a scale in. There is enough total space in cluster to fit it, and its below our 90% reservation but not enough space on an single node to place task.
Events are published and only way to know if pending task is because lack of space vs bad task definition is by parsing the service events per service.
UPDATE: Since AWS is working on a solution for this
@jamiegs are they?
i'm planning/testing an ec2 cluster implementation that i would like to eventually autoscale, however everything i'm reading still suggests the type of workarounds described in posts linked from this issue - i can't find anything official.
@jamiegs are they?
@hlarsen well, I guess I assume they are since they have this ticket to improve autoscaling under researching on their roadmap.
ahh sorry, i missed it - just checking if you had any inside info =)
for anyone else who missed it, this is currently in the Research phase on the roadmap, so if you're trying to do this now it appears lambda-based cluster scaling is the way to go.
I just ran across https://github.com/aws/containers-roadmap/issues/121 which is similar to my request if not a duplicate. At the end of the day we all want a reliable way to ensure that there are always enough instances running to add additional tasks when they are needed.
You can work around this by using DAEMON tasks (instead of REPLICA) and doing all scaling at the ASG level (instead of application auto scaling). Works OK if you only have one service per cluster, but it is kind of an abuse of daemonsets.
Hi everyone, we are actively researching this and have a proposed solution in mind. This solution would work as follows:
Thoughts on this proposal? Please let us know!
This would be awesome!
Sign me up
I love it.
This is a much needed feature for ECS. Right now, users have to over-provision their cluster instances or implement custom engg solutions using lambdas/cloudwatch for scale out and in scenarios. The cluster autoscale-aware feature with respect to the services/tasks is absolutely becessary. While this may not be applicable for Fargate, this is still needed for ECS use cases. I hope this gets prioritized and delivered, we have been waiting for this.
@coultn: I think your proposal will work just fine for clusters that start their tasks using ECS services. I have a few thoughts to keep in mind:
Maybe this is out of scope, but since you brought up the automated EC2 instance protection bit, I think that you should also take into consideration changes to the EC2 launch configuration (i.e. new AMI, instance type, etc) to help make the management of the ECS clusters easier. I link at the bottom of my comment to a CloudFormation template that does this for a cluster that runs batch jobs. For the clusters that run web applications, we wouldn't want the instance protection bit to be in the way when the fleet is rolled by autoscaling.
We have some QA clusters that run ECS services with development git branches of work that is currently in progress. These environments usually stick around for 8 hours after the last commit. Most of these environments hardly receive any traffic unless automated performance testing is in progress. Let's assume that we currently have X ECS services, and that all X of them have the same requirements from ECS (memory/CPU) for simplicity. Will the new CloudWatch metric tell us that we can start one copy of a task on just one of those services? So if the metric says we can start one, and if two ECS services try to scale out at the same time, then we'll encounter a stall scaling out the second service? Or, will the new metric tell us if we can start one copy of every ECS service that is currently configured? Hopefully it is the former since scaling policies can be configured to handle the latter case if needed.
This proposal won't work for ECS scheduled tasks. We have a cluster that runs over 200 cron-style jobs as ECS scheduled tasks for a legacy application. It's a mix of small and large jobs and our ECS cluster typically doubles the number of EC2 instances during parts of the day when more larger jobs are running. These jobs aren't setup as ECS services. Initially we started using the CloudWatch event rules to start an ECS task however we had a large number of jobs that wouldn't start during some parts of the day due to the run-task API call failing due to insufficient capacity in the cluster. To fix this, we still use CloudWatch event rules, however it sends a message to a SQS queue and a Lambda function is subscribed to it. The function will try to start the task, and if it fails due to insufficient capacity, then it will increase the desired number in the autoscaling group, and try again later. The tasks are bin packed to help make scaling in easier. The jobs have a finite duration, so scaling in involves looking for empty instances, draining them, and then terminating them. I have a CloudFormation template that implements this use case at https://github.com/MoveInc/ecs-cloudformation-templates/blob/master/ECS-Batch-Cluster.template and it's fairly well commented at the top with more details, including how we handle launch configuration changes (for AMI updates, new EC2 instance types, etc).
I can beta test some of your proposed changes at my work if you're interested.
@coultn Looks good. My organization implemented a system very similar to this. Do you have any more details regarding how your solution computes the new scaling metric? We currently base it on the largest container (cpu and memory) in the ECS cluster -- similar to this solution.
Can you please clarify:
The metric will only be less than 100% if you are guaranteed to have space for at least 1 more task of each service and RunTask group already running in your cluster.
Does this mean that if I have a cluster with 10 services then the new metric will be over 100% if it can't not fit 1 task for each service combined (additive) for a total overhead of the combined requirements of the 10 tasks? Or is it a "shared" overhead that will essentially guarantee you're service/task with the largest deployment can add one more?
Is the "full" metric CPU, Memory, connections, disk, something else? I feel like this type of metric makes sense for
Can someone explain how the metric would work for large, mixed workload, multi-ASG clusters? If that's an anti-pattern for ECS it would also be good to know where the product roadmap is headed.
I second @masneyb third point. We use ECS in combination with Jenkins ECS plug-in to start containers (tasks) for every Jenkins job. The ECS plug-in is smart enough to retry tasks that failed due to insufficient resources. But I don't see how this new metric could be of much help in this case since it still only looks at the current resource usage and not the required resources. Settings a threshold < 100% is only a heuristics. Ideally - and I get that this is a more fundamental change - ECS has a queue of pending tasks (like any other "traditional" queueing system) instead if immediately rejecting them. The length of the queue and its item's resource requirements can then easily be used to scale in and out.
This sounds good. Will the scaling out policy also takes care of scaling with respect to AZ spread? As in, will the scaling activity start new instance based on the AZ spread its task is looking for to scale or will it be random?
@coultn sounds good overall, with the exception of one thing (which I may be misunderstanding).
- ECS will automatically set up a target tracking scaling policy on your ECS cluster using this new metric. You can set a target value for the metric less than or equal to 100%. A target value of 100% means the cluster will only scale out if there is no more space in your cluster for at least one service or RunTask group. A target value of less than 100% means that the cluster will keep some space in reserve for additional tasks to run.
To me the statement in bold implies that when the cluster "fullness" metric is at 100% then there is still space for at least one more task, which is not what I would expect, especially since you are not allowed to set a target tracking metric of greater than 100%. What do you do if you actually want your cluster to be fully (efficiently) allocated?
As an example lets say my cluster consists of 5 nodes, each with 2 vCPUs running a single service where each task requires 1 vCPU of capacity.
My understanding of the current proposal is
My expectation of what the metric would be:
So ideally for me, at 10 tasks with 100% target tracking the ASG would be at steady state. If the ECS service tries to allocate an 11th task then the metric would go to 110% and target tracking would cause the ASG to start a 6th node. Now if I decide instead that I do want hot spare behavior, then I would set my target fullness to 90%.
To expound further on my use case, my intention would be set target tracking at the ASG level to 100% allocation and then separately set target tracking at the ECS service level to a specific CPU utilization (30% for example). So rather that having a spare node not doing anything, I would have all nodes active, but with sufficient CPU capacity to handle temporary spikes. If traffic gradually starts to climb and average CPU usage goes above 30%, then ECS would attempt to start more tasks and the ASG would start more nodes, and while the new nodes are starting up, there is still sufficient CPU headroom.
I definitely think you guys should make easy for end users to determine appropriate percentage for having one, two or three hot spares, since the math won't always be as simple as my example. But I think 100% utilization should be an option, even if you don't think it should be the default. Perhaps in the console you could auto-calculate and pre-fill the "1 hot spare" percentage for users, or at least pre-calculate some examples.
Thanks for the comments/questions everyone! Some clarifications and additional details:
@coultn Would it be helpful to have an option for RunTask where it would keep trying until capacity is available? If this option were available with RunTask, then the new metric would scale appropriately to all tasks, both service tasks and RunTask tasks, and scheduled tasks
YES. Currently we're investigating K8S because of this and other reasons.
@coultn
Less than 100% means that each service or RunTask invocation has room for at least one more task.
What if I have 5 nodes and each node has remaining 300MB memory which sums up to 1.5GB, however, any extra service task will reserve 1.0GB of memory, which can't be fulfilled in any existing node in the cluster. Will the metrics display less than 100% or greater than 100%? Obviously, we need to scale out the cluster for an extra node to make sure there's always enough room on a single node to run an extra service task.
@coultn
Less than 100% means that each service or RunTask invocation has room for at least one more task.
What if I have 5 nodes and each node has remaining 300MB memory which sums up to 1.5GB, however, any extra service task will reserve 1.0GB of memory, which can't be fulfilled in any existing node in the cluster. Will the metrics display less than 100% or greater than 100%? Obviously, we need to scale out the cluster for an extra node to make sure there's always enough room on a single node to run an extra service task.
If there is not sufficient space on any instance for at least one additional service task, and the desired count is greater than the running count for that service, then the metric will be greater than 100%.
If there is not sufficient space on any instance for at least one additional service task, and the desired count is equal to the running count for that service, then the metric will be equal to 100%.
Would it be helpful to have an option for RunTask where it would keep trying until capacity is available? If this option were available with RunTask, then the new metric would scale appropriately to all tasks, both service tasks and RunTask tasks, and scheduled tasks.
This would indeed be very helpful!
If there is not sufficient space on any instance for at least one additional service task, and the desired count is greater than the running count for that service, then the metric will be greater than 100%.
If there is not sufficient space on any instance for at least one additional service task, and the desired count is equal to the running count for that service, then the metric will be equal to 100%.
Appreciated your clarifying this issue. That would be super helpful!
- Equal to 100% means 'exactly full' for at least one service or RunTask invocation, and less than full for the rest. In other words, there is at least one service, or one set of tasks started with RunTask, for which the cluster has no additional capacity, but is exactly at capacity. If you set your target value to 100%, then the cluster might not scale until it completely runs out of resources for a service, and there are additional tasks that cannot be run.
- The metric will accommodate both single-service and multi-service clusters. It looks at the capacity across all services (and RunTask invocations) and computes the maximum value. The services and tasks do not need to have the same resource requirements or constraints.
I need some confirmation on my understanding. Is this metric defined per service or per cluster? For example, if i have 5 services and each of these services are out of capacity and need to scale out and if i set the metric to less than 100%, would it scale 5 more ec2 instances in the ecs cluster for each of the 5 services that need scaling or intelligently scale just enough ec2 instances to allow all 5 services to properly scale out completely? Thanks!
- Equal to 100% means 'exactly full' for at least one service or RunTask invocation, and less than full for the rest. In other words, there is at least one service, or one set of tasks started with RunTask, for which the cluster has no additional capacity, but is exactly at capacity. If you set your target value to 100%, then the cluster might not scale until it completely runs out of resources for a service, and there are additional tasks that cannot be run.
- The metric will accommodate both single-service and multi-service clusters. It looks at the capacity across all services (and RunTask invocations) and computes the maximum value. The services and tasks do not need to have the same resource requirements or constraints.
I need some confirmation on my understanding. Is this metric defined per service or per cluster? For example, if i have 5 services and each of these services are out of capacity and need to scale out and if i set the metric to less than 100%, would it scale 5 more ec2 instances in the ecs cluster for each of the 5 services that need scaling or intelligently scale just enough ec2 instances to allow all 5 services to properly scale out completely? Thanks!
There is a single metric for each EC2 auto scaling group in a cluster. It is computed for each service and standalone RunTask invocation currently active in that auto scaling group; the actual metric is then taken as the maximum value across all of the computed values. So, for each auto scaling group you have a single metric that accounts for all services and tasks. In your example, with 5 services running, as long as at least one of those services needs more capacity, the cluster will scale out.
There is a single metric for each EC2 auto scaling group in a cluster. It is computed for each service and standalone RunTask invocation currently active in that auto scaling group; the actual metric is then taken as the maximum value across all of the computed values. So, for each auto scaling group you have a single metric that accounts for all services and tasks. In your example, with 5 services running, as long as at least one of those services needs more capacity, the cluster will scale out.
Thanks for the confirmation and it makes sense. This would mean that i would have the freedom to decide how much ec2 instances i want to scale up/down by using a ScalingPolicy? So then my question is, would it be compatible with this ScalingPolicy feature? and hence automatically use it in CloudFormation as well?
There is a single metric for each EC2 auto scaling group in a cluster. It is computed for each service and standalone RunTask invocation currently active in that auto scaling group; the actual metric is then taken as the maximum value across all of the computed values. So, for each auto scaling group you have a single metric that accounts for all services and tasks. In your example, with 5 services running, as long as at least one of those services needs more capacity, the cluster will scale out.
Thanks for the confirmation and it makes sense. This would mean that i would have the freedom to decide how much ec2 instances i want to scale up/down by using a ScalingPolicy? So then my question is, would it be compatible with this ScalingPolicy feature? and hence automatically use it in CloudFormation as well?
Along with the new metric, ECS will actually automatically set up a Target Tracking scaling policy on the auto scaling group on your behalf. You will be able to set the target value for the scaling policy. You will also be able to add other scaling policies in addition to the ECS-managed scaling policy.
@coultn , I have a couple of follow-on questions (thanks in advance):
1) How often would the new metric be published? Every minute?
2) Will it be possible to use the new metric without any target tracking being set up automatically?
My understanding is that target tracking needs the metric to be in breach for 5 minutes before it will take any action, and that's not configurable. (This is based on what I've seen of target tracking on ALB RequestCountPerTarget.) 5 minutes is too long for the use-case I'm working on, where we want to scale out our latency-sensitive service ASAP, where the trade off in ending up with slightly too much capacity is currently acceptable. We're currently using a modified version of https://garbe.io/blog/2017/04/12/a-better-solution-to-ecs-autoscaling/ where the "tasks that fit" metric can go negative when the desired count of tasks won't fit in the available space, combined with step scaling.
Thanks so much for the information @coultn. The metric makes sense and will definitely make our lives easier. Earlier you mentioned that the metric will live on each ASG in the cluster. Can you clarify how that will work?
For example, lets say you have two ASGs that are exactly the same AMI, instance size, available ENIs, etc deployed with 10 nodes each for a total of twenty nodes. Can you help me understand what the metric will be for each ASG in the following scenarios:
Obviously, the above gets more complicated when bringing in placement constraints. You could argue the metric should be per Container Instance Attribute per Cluster since you will then be able to scale the required resources better (can explain this more if needed but don't want this to distract from above question)
@coultn , I have a couple of follow-on questions (thanks in advance):
1. How often would the new metric be published? Every minute? 2. Will it be possible to use the new metric without any target tracking being set up automatically?
My understanding is that target tracking needs the metric to be in breach for 5 minutes before it will take any action, and that's not configurable.
@rdawemsys Thanks for your feedback. The metric will likely be published once per minute (the same as other existing ECS metrics). In our current design, ECS will either publish the metric AND configure target tracking scaling, or not publish the metric. Keep in mind that you can configure additional scaling policies that work with target tracking.
Regarding the five minute delay you mentioned, that is not correct in general. The timing and latency of target tracking scaling alarms depends on the frequency of the metric being published. What is the maximum scaling latency that you would find acceptable?
Thanks so much for the information @coultn. The metric makes sense and will definitely make our lives easier. Earlier you mentioned that the metric will live on each ASG in the cluster. Can you clarify how that will work?
For example, lets say you have two ASGs that are exactly the same AMI, instance size, available ENIs, etc deployed with 10 nodes each for a total of twenty nodes. Can you help me understand what the metric will be for each ASG in the following scenarios:
1. Enough tasks are deployed to completely fill 10 nodes, spread evenly across both ASGs. I would expect the metric to be 50% for both ASGs. 2. Enough tasks are deployed to completely fill 10 nodes, but they are bin packed on nodes in the first ASG. With the previous discussions I'm not sure if each ASG will still be 50% since the cluster is only half full or if ASG one will be 100% (all 10 nodes taken) and ASG two will be 0% (no space taken)
Obviously, the above gets more complicated when bringing in placement constraints. You could argue the metric should be per Container Instance Attribute per Cluster since you will then be able to scale the required resources better (can explain this more if needed but don't want this to distract from above question)
@zbintliff There are some additional details that are probably too involved to get into here. I'll do my best to describe it things will work in a general sense, without having to write a 50 page treatise on scaling and placement :) The core concept is that each ASG in a cluster has its own value for the metric, based on the tasks and services that the ECS control plane wants to run in that ASG. So, to address your specific examples:
Great that makes sense! I know it can get complicated so thank you for the examples.
@coultn Thanks for your replies. In answer to your question:
Regarding the five minute delay you mentioned, that is not correct in general. The timing and latency of target tracking scaling alarms depends on the frequency of the metric being published. What is the maximum scaling latency that you would find acceptable?
We're running a latency-sensitive service in ECS, which has big spikes in traffic, and we need to scale out ECS pretty rapidly. Ideally within 1-2 minutes for scale out. Scale in can be slower.
We're currently using ECS average CPU for triggering scale-out, which seems to happen reliably within 1-2 minutes of CPU spiking. We're using ALB RequestCountPerTarget for scale-in, which is observed to have a ~5 minute lag for triggering an alarm.
I understand it may not be possible to scale out EC2 within that 1-2 minute timescale, which is why we're leaving some headroom on our ECS cluster, and then scaling out EC2 to maintain that headroom.
I should also add that we've introduced autoscaling relatively recently into this particular service, so we're gaining some understanding of how autoscaling and our traffic patterns interact. We're going to be tweaking things as we go along. It may turn out that our autoscaling rules do not need to be so aggressive -- we're being cautious with our roll-out to maintain service levels.
@coultn will this feature depend on Container Insights, and therefore require us to pay for Container Insights in order to take advantage of integrated autoscaling?
@talawahtech No, it will not require container insights.
It would be great feature. Is there any estimate on when it will be available?
I have found this suggested mechanism for managing ECS + EC2 updates to be more complex than seems reasonable: https://aws.amazon.com/blogs/compute/automatically-update-instances-in-an-amazon-ecs-cluster-using-the-ami-id-parameter/
Will this feature have any impact to the necessity of that approach for safely deploying things like an AMI update together with an ECS task change?
Edit: It looks like this is a separate issue tracked at https://github.com/aws/containers-roadmap/issues/256
@jaredjstewart The feature outlined in this issue isn't specifically targeted at making AMI updates easier, although it may indirectly help with that. What would you like to see us do with AMI updates?
@coultn have any prevision for release this feature?
I am really very interested in using.
Thank you!
+1 on the proposal. This is close to what we have implemented and would be great to have native integration with ECS. What we do currently is something like this:
@coultn In a nutshell, I would like ECS to manage the graceful termination of ec2 instances in an autoscaling group that backs an ECS cluster. (I.e. make sure to drain any tasks and deregister the instance from associated load balancers before proceeding with termination.)
This would include updates to the ec2 AMI, changes to different instance types, etc.
I believe this request is already captured at https://github.com/aws/containers-roadmap/issues/256.
Thanks again, Jared
In addition to the "fullness" metric it might be nice if metrics for desired memory and CPU reservation were also available for the cluster, that is the equivalent of the existing MemoryReservation and and CPUReservation metrics but taking into consideration tasks that have not yet been placed as well. I would image that most of the work to calculate these metrics would already need to be done to calculate the fullness metric anyway.
I haven't seen any announcement but look what is now in ECS console :)
@soukicz thanks for letting us know - Are you able to create a provider? It fails with not specific reason for me...
Tell us about your request Blog posts like this exist because it is difficult to coordinate service autoscaling with instance autoscaling: https://engineering.depop.com/ahead-of-time-scheduling-on-ecs-ec2-d4ef124b1d9e https://garbe.io/blog/2017/04/12/a-better-solution-to-ecs-autoscaling/ https://www.unicon.net/about/blogs/aws-ecs-auto-scaling
Which service(s) is this request for? ECS and EC2
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? I would love for ECS to provide a simple/easy way to tell a supporting EC2 ASG to scale up when a task cannot be placed on its cluster. I'd also love to see this concern addressed: https://github.com/aws/containers-roadmap/issues/42
Are you currently working around this issue? I'm doing something similar to this: https://garbe.io/blog/2017/04/12/a-better-solution-to-ecs-autoscaling/
Additional context Yes, please note that I love Lambda and Fargate but sometimes regular old ECS is a better fit and fwiw, Google Cloud has had cluster autoscaling for a long time now: https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler. Also, I haven't tried EKS yet but cluster autoscaling would be super helpful there.