buildkite / elastic-ci-stack-for-aws

An auto-scaling cluster of build agents running in your own AWS VPC
https://buildkite.com/docs/quickstart/elastic-ci-stack-aws
MIT License
417 stars 275 forks source link

Enable AzRebalance and Capacity Rebalance processes #944

Open keithduncan opened 3 years ago

keithduncan commented 3 years ago

Presently we suspend the AzRebalancing process in our Auto Scaling group to eliminate a source of ASG initiated termination.

I think there are benefits to re-enabling this, and adding support for Capacity Rebalancing. A pre-emptive rebalance, especially a capacity rebalance when using Spot Instances, means the stack instances are less likely to experience the hard 2 minute spot interruption shutdown.

freewil commented 2 years ago

Id be interested in seeing AzRebalancing being re-enabled. Currently working on a project to attach EBS volumes from a pool (with caches) to build machines at boot. With AZRebalancing disabled, it makes it tricky to maintain a EBS pool of the proper size, since it takes 5-10 minutes to initialize a EBS volume and it needs to be in the same AZ as the EC2 instance. With AZRebalancing enabled, I suspect this would make it much easier to maintain the EBS pool across AZs, as the EC2 instances would be much more likely to be balanced across AZs.