kohidave commented 4 years ago

Copilot Auto Scaling

This doc talks about introducing auto-scaling to Copilot services. Auto scaling is a feature which allows customers to automatically change the number of copies of their service (count) based on some metric, time interval, or alarm. In this design we’ll look at the types of scaling policies we want to help our customers with, how we can represent those policies in our manifest, and how we can technically implement these policies.

For sake of simplicity, we’ll assume our services are ECS/Fargate services. Unfortunately, at the time of writing this doc, we can’t take advantage of Fargate Spot (it isn’t available in CF yet) - but we’ll talk about how in the future we can allow customers to “burst” using spot capacity. EC2 services have a slew of complexities that we won’t tackle here.

Goal

The goal of this design is to agree to gather your (our customer!) feedback on:

The scaling scenarios we want to support, and which ones we don’t want to support
The way we represent auto-scaling in the manifest
1. Specifically two areas - updating the count definition, and adding a scaling section
Attempt to fit fargate-spot into the manifest, given how well it fits the scaling use case

As usual with any Manifest design, a meta goal is to provide a simple way for customers to tell us they want to do while still enabling more complex configurations through overrides or more complex types.

Types of Scaling Policies

There are three different main types of scaling policies that our customers use, and we’ll talk a little about when and why folks would use each type.

Target Tracking

Target Tracking is the latest and greatest type of auto scaling policies. The way it works is a customer specifies a desired target they’d like a particular metric to stay at or below. An example would be that you want to keep the average CPU utilization at or below 70%. When the average CPU utilization rises above 70% for N datapoints, Autoscaling will start increasing the number of tasks until the average CPU utilization falls to or below 70% again.

An interesting note is that the speed at which Autoscaling will create new tasks is proportional to how far over your target threshold you are. For example, if your CPU rises to 80%, Autoscaling will provision (desired count) * 80/70. It will then pause for a period (scale out cool down period), and then try again. Since, under the hood autoscaling is powered by alarms, and alarms typically have a resolution of 1minute, this limits the speed that one receive new data about your service and can scale up.

additional considerations when dealing with target tracking policies

Scheduled Scaling

Scheduled Scaling initiates scaling events at certain times. With scheduled scaling, you provide either a date, rate or cron expression to trigger a scaling event. When the scaling event is triggered, it can set the desired count of your service to be at least some number, and at most another number. This means your scheduling events can trigger both scale-in (reducing the # of tasks) and scale-out (increasing the # of tasks). Practically, folks will need to create multiple scheduled scaling policies (one for scaling up, and one for scaling back down again). An example:

Every 9am scale to 10 tasks

Every 9pm scale to 2 tasks

Step Scaling

Step scaling allows you to scale up based on the magnitude of an alarm breach. You provide an alarm, and based on predefined ranges of how how far the alarm metric has been breached, determines how many tasks to increase. You can run a scale-in version which does the opposite. It’s kind of confusing to explain, but let me give you an example.

Assume we have an alarm for service throttling exceptions. It triggers when the average number of throttles for our service is over 50/min. We could have a step scaling policy like:

Throttle Range	Number of tasks to increase
50 - 60 throttles/min	1
60-100 throttles/min	5
100-1000 throttles/min	10

Just for completeness, we’ll say our cooldown period (the time between scaling actions) is 60 seconds.

In this example, as long as the alarm metric is between 50-60mins, we’ll keep spinning up 1 task per 60 seconds (the cooldown period). If the metric goes up even more, like all of our requests are getting throttled, then we might want to provision tasks faster (so maybe 10 tasks per cooldown period).

You can do this with a negative version as well, for scaling in.

Many step scaling use cases can be solved with target tracking.

Manifest Design

So in this design, I want to focus mostly on the Target Tracking design. We can do a deeper dive into scheduled and step scaling policies in a separate design. I suspect target tracking will be the most popular scaling approach.

Refresher, our goals for the manifest design is to:

Make it super easy to get a great solution
Allow more complex solutions, even though the syntax might be more dense
Allow the use of Fargate Spot

Simple Target Tracking

Target tracking is so common that we have a couple of built in metrics around it. This example shows us overloading the count type to reveal the min/max our service can scale, as well as some predefined scaling targets. In this example we show the three predefined scaling targets:

Average CPU utilization
Average Memory Utilization
Concurrent requests/ target
1. This metric is only available for Load Balanced services - since the metric comes from the ALB.

While in this example I only show one uncommented value, but you could specify all three. Each provided value will generate its own scaling policy.

Expected Usage: most common

name: frontend
type: Load Balanced Web Service
image:
  build: ./Dockerfile
  port: 80
http:
  path: '/'
cpu: 256
memory: 512
count:
  range: 1-100
  cpu: 70%
  # memory: 80%
  # requests: 1000

For our default manifests we can keep the generated count: 1, but have a commented out prod override section. Other metrics we could build: We may want to add another convenience method for SQS queues. Perhaps something like the below. Our CDK patterns have us scale using step scaling for SQS.

count:
  range: 1-100
  queue:
    name: my-awesome-queue
    depth: 20

We may want to add another convenience method for target request time. (ALB only)

count:
  range: 1-100
  response-time: 500ms

Advanced Target Tracking

The above simple target tracking is often expressive enough for most folks, but you can specify more sophisticated target tracking policies like this:

name: frontend
type: Load Balanced Web Service
image:
  build: ./Dockerfile
  port: 80
http:
  path: '/'
cpu: 256
memory: 512
count:
  range:1-100
  targets:
    -
      # This is effectively the same as the CPU: 70 shorthand
      # but the customer can specify custom cooldowns and disable scale in.
        metric:cpu
        value:70
        scale-in-cooldown: 5m
        scale-out-cooldown 10m
        disable-scale-in: false
    -
      # This metric is completly custom - it could be any
      # metric from CloudWatch
      metric:
        name: CPUUtilization
        namespace: /aws/ecs/insights
        dimensions:
            ServiceName: frontend
        statistic: average
        unit: percent
      value: 70
      scale-in-cooldown: 600s
      scale-out-cooldown 1000s
      disable-scale-in: true

One large callout with custom metrics is that they often call for exact resource names. This will be difficult for us to facilitate without some sort of templating such as adding !Addons MySQSQueue.ARN or something which can help resolve addons outputs. I’ll punt on designing this for now but there are a bunch of options to look at here.

Step Scaling

In general, we’ll assume that step scaling is more of an advanced feature so we’ll include less convenience methods around it. The real question here is about the alarm - where do customers generate it? They can add them via addons, but if the metric needs to reference the service at all, that won’t work. We’ll assume the alarm already exists in this example, but until we can figure out where/how to generate these alarms, we can’t effectively support step-scaling.

name: frontend
type: Load Balanced Web Service
image:
  build: ./Dockerfile
  port: 80
http:
  path: '/'
cpu: 256
memory: 512
count:
  range:1-100
  step-scaling:
    -
      alarm:alarm/OUTPUT_ALARM
      # By default the scale down is the opposite of steps
      #  unless disable-scale-in is set. 
      steps:
        -
          at:70
          adjust:5
        -
          at:90
          adjust:10
      scale-in-cooldown: 600
      scale-out-cooldown 1000
      disable-scale-in: true

Scheduled Scaling

To trim the scope of this design down, we'll take a look at scheduled scaling , separately.

Spot Capacity

Allowing customers to scale using Spot Capacity (or just utilize spot capacity at all) will help folks save money while still optimizing for availability. Since our scaling work expands the count section, we can take this opportunity to think about the way that customers can tell Copilot how they’d like to use spot.

Option 1 - separate ranges In this example, customers can provide two ranges. One for dedicated fargate instances, and another for spot. This is nice because it allows customers to keep a number of dedicated instances up, but burst into spot. It also allows them switch the ranges. This also allows customers to opt into spot, completely.

name: frontend
type: Load Balanced Web Service
...
count:
  range: 
    dedicated: 1-5
    spot: 5-25
  cpu: 70%
  # memory: 80%
  # requests: 1000

Option 2 - spot as percent

This option has customers not specify a particular range for spot, but instead specify a percentage that is spot. I’m not sure if this is how folks actually use Fargate Spot.

name: frontend
type: Load Balanced Web Service
...
count:
  range:1-100
  spot: 85%
  cpu: 70%
  # memory: 80%
  # requests: 1000

Cost Constraints

Another bit of feedback that we've heard is allowing customers specify a max cost - instead of a range of tasks.

name: frontend
type: Load Balanced Web Service
...
count:
  cost: $55
  cpu: 70%
  # memory: 80%
  # requests: 1000

This option is really interesting - it'd require some precomputing on our part, and shifting capacity between spot and dedicated. We'd have to make some decisions around the break down of spot/dedicated tasks which might be difficult to optimize for. The benefit is that as folks change their Fargate task size (mem/cpu) they'd automatically have their service scale back the number of tasks provisioned (this may also be surprising).

There may be more options that I’m not thinking of here, so please let me know if you have any awesome ideas!

ctrlplusb commented 4 years ago

Gosh, I am loving the sound of this. Thank you to the entire team for taking on this project. Been dreaming of a solution like this for ages. 💜

resouer commented 4 years ago

It seems scaling would not be part of the tool? (and live in manifest instead?) https://github.com/aws/copilot-cli/issues/810

efekarakus commented 4 years ago

Our plan is to support autoscaling with a field in the Manifest (not through a command). We'll update the issue once we have a concrete design :)

kohidave commented 4 years ago

Howdy ya'll! I've updated the top comment with a design proposal. I would love your feedback, so please let us know what you think.

I'm especially interested in what you think of our proposal of how we should represent scaling in the manifest and how we should help folks think about fargate spot!

Thank you.

hnrc commented 4 years ago

Beautiful! Kudos for an excellent write-up :clap:

Most of our use cases are covered by some combination of CPU/memory/requests target tracking.

In addition to that we also have quite a few cases where it would be sweet to trigger a scale up based on SQS depth. In our setup, those SQS queues are currently managed outside of Copilot which I assume means that they would fall into the more complicated custom CloudWatch metric category?

Also :bow: for taking Spot into consideration. The first option with separate ranges would definitely be good enough for us.

niros1 commented 3 years ago

Great, but why I can't find about this feature in the documentation?

efekarakus commented 3 years ago

Hi @niros1 ! Apologies that it wasn't easy to find in the docs, here is the link: https://aws.github.io/copilot-cli/docs/manifest/lb-web-service/#count does that help?

niros1 commented 3 years ago

Thanks a lot, i missed that.

Fodoj commented 3 years ago

Is it possible to set up auto scaling based on custom CloudWatch metrics via Copilot?

efekarakus commented 3 years ago

Hi @Fodoj !

We haven't implemented yet the "Advanced Target Tracking" or "Step Scaling" sections of the design above. Are you looking into using target tracking with a custom CloudWatch metric? or step scaling

rushim1 commented 1 year ago

Hi @efekarakus, Is there a estimated timeline on when this step scaling section will be implemented? I have all the alarms created in cloudwatch. Is there an alternative to attaching step scaling to each service by going to UI?

rickychew77 commented 1 year ago

Hi there, we are also looking for step scaling to be more effectively control the scaling. at time of writing it seems like only target tracking scaling is supported in manifest.yml

dannyrandall commented 1 year ago

@rushim1 @rickychew77 thanks for sharing your interest in step scaling! I just created a dedicated issue (#5241) to track that feature request. If you could comment/:+1: over there, that would be great to help us prioritize.

aws / copilot-cli

[Feature] Auto Scaling #1154

Copilot Auto Scaling

Goal

Types of Scaling Policies

Target Tracking

Scheduled Scaling

Step Scaling

Manifest Design

Simple Target Tracking

Advanced Target Tracking

Step Scaling

Scheduled Scaling

Spot Capacity

Cost Constraints