buildkite / elastic-ci-stack-for-aws

An auto-scaling cluster of build agents running in your own AWS VPC
https://buildkite.com/docs/quickstart/elastic-ci-stack-aws
MIT License
417 stars 275 forks source link

Use asg warm pools for faster buildkite job starts #822

Open nitrocode opened 3 years ago

nitrocode commented 3 years ago

https://aws.amazon.com/about-aws/whats-new/2021/04/amazon-ec2-auto-scaling-introduces-warm-pools-accelerate-scale-out-while-saving-money/

https://aws.amazon.com/blogs/compute/scaling-your-applications-faster-with-ec2-auto-scaling-warm-pools/

If we could keep X instances warmed up, we could start jobs much faster without having to set the min count on the asg to something non-zero.

It should be added soon to cloudformation: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-as-group.html

chloeruka commented 3 years ago

Woah! You're quick. Yeah this feature looks great; especially for Windows instances that take a while to boot. According to the announcement it's not yet available on Cloudformation, but when it is we'll take a look into it.

nitrocode commented 3 years ago

The only param that would require an additional input to use the warm pool would be the min size of it. We use an asg min size of 0 and max size of 10 so a warm pool min size of 3 seems reasonable.

The cloudformation docs have been released.

We could add the following

Parameters:
  WarmPoolMinSize:
    Description: Minimum number of instances in warm pool
    Type: Number
    Default: 0

Conditions:
    UseWarmPool:
      !Not [ !Equals [ !Ref WarmPoolMinSize, 0 ] ]

Resources:
  WarmPool: 
    Type: AWS::AutoScaling::WarmPool
    Condition: UseWarmPool
    Properties:
      AutoScalingGroupName: !Ref AgentAutoScaleGroup
      MinSize: !Ref WarmPoolMinSize
      PoolState: Stopped

What do you folks think?

theonlysinjin commented 3 years ago

This is fantastic. I've been using the Cloudwatch metrics to scale ASG when there are no idle agents (ie all are busy now), to help beat the agent start time. Though it seems our agents now start (with the update) in just under 2 minutes which is pretty good.

dieend commented 3 years ago

Warm Pool is now available in CloudFormation https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-autoscaling-warmpool.html

Right now I can't manually update the created ASG from the current template because the template configuring MixedInstancesPolicy

https://github.com/buildkite/elastic-ci-stack-for-aws/blob/7b3d02cf2de7cdfdcb1e08b7275371529e2a4e56/templates/aws-stack.yml#L1038-L1060

We have to remove them if we'd like to use WarmPool

aiven-amartin commented 2 months ago

Any progress/caveats on this issue? Would be really nice to have this option rather than trying to bake git checkouts/other slow workspace parts into the AMI itself.

alex-shakouri-ai commented 1 month ago

This would be helpful on our side as well! Would help out with ensuring we have agents on a warm start up to be able to start jobs faster!

wolfeidau commented 1 month ago

@alex-shakouri-ai We had a look at this and it would be hard to just plug into the existing model.

Currently we aren't looking at re architecting the auto scaling.