buildkite / elastic-ci-stack-for-aws

An auto-scaling cluster of build agents running in your own AWS VPC
https://buildkite.com/docs/quickstart/elastic-ci-stack-aws
MIT License
414 stars 265 forks source link

Autoscaling not functioning as expected on queues specified as tags #1291

Closed gws closed 4 months ago

gws commented 4 months ago

Describe the bug When specifying queues as tags in addition to using the BuildkiteQueue option, and the autoscaling group associated with the stack in question is scaled to 0, no agents will spin up in response to a request for an agent using one of the queues specified only in the list of tags.

Steps To Reproduce Steps to reproduce the behavior:

  1. Create a stack with BuildkiteQueue set to primary, with the additional tag queue=secondary
  2. Ensure the group is scaled to zero
  3. Request queue: secondary in your Buildkite pipeline, and kick off a build

Expected behavior Requesting queue: secondary in the Buildkite pipeline will cause an agent to spin up, since support for multiple queues is documented to work in Buildkite, assuming agents are around to pick up the build.

Actual behaviour The stack autoscaler does not spin up a new agent.

Stack parameters (please complete the following information):

Additional context I can see how this might be interpreted as a feature request, but landed on a bug report because it surprised me based on Buildkite's documented behavior.

Related: https://github.com/buildkite/elastic-ci-stack-for-aws/issues/797

moskyb commented 4 months ago

hiya @gws, at current, the autoscaler is only configured to look at the queue defined by the BuildkiteQueue cfn param, and ignores those in the BuildkiteAgentTags. Managing scaling for multiple queues is a major pain, and would require a significant rewrite of the scaler, and this was never an intended use case for the elastic stack.

while multiple queues is a feature that's documented to work, we're planning to make Clusters generally available soon, which will remove the ability to use multiple queues in an agent when running in a cluster.

gws commented 4 months ago

@moskyb Thank you! That's helpful for planning 👍🏻