buildkite / elastic-ci-stack-for-aws

An auto-scaling cluster of build agents running in your own AWS VPC
https://buildkite.com/docs/quickstart/elastic-ci-stack-aws
MIT License
417 stars 267 forks source link

No tags on SQS queues #708

Closed sj26 closed 3 years ago

sj26 commented 4 years ago

The elastic stack creates SQS queues as part of its lifecycle handling through lifecycled: https://github.com/buildkite/lifecycled/blob/74e330d38ba66da736591209b38fc218a15a7c8a/queue.go#L72-L78

But these queues are created without tags. Most resources created by the stack either have a tag reflecting the stack name: https://github.com/buildkite/elastic-ci-stack-for-aws/blob/c7b586ac3510cbe8d10e8d004c08dca03be09611/templates/aws-stack.yml#L523-L525

or the cost allocation tags: https://github.com/buildkite/elastic-ci-stack-for-aws/blob/c7b586ac3510cbe8d10e8d004c08dca03be09611/templates/aws-stack.yml#L65-L70 https://github.com/buildkite/elastic-ci-stack-for-aws/blob/c7b586ac3510cbe8d10e8d004c08dca03be09611/templates/aws-stack.yml#L701-L706

We should propagate these to the queues created by lifecycled so that customers can track these created resources against the stack for audit and cost purposes.

There's already a relevant issue on lifecycled as well: https://github.com/buildkite/lifecycled/issues/17

yob commented 3 years ago

Looking at the lifecycled code, it seems to create the SQS queues to catch and handle ASG-initiated scale down terminations.

However, on master the ASG never initiates a scale down. Each instance self-terminates and reduces the ASG desired count by 1. Could we configure lifecycled to skip the SQS queue creation when we only need to monitor for spot instance termination?

sj26 commented 3 years ago

I did a little look-see if we could remove it, and thought maybe we couldn't because of spot instances?

yob commented 3 years ago

I've never worked with lifecycled, but it looks like the impending spot termination notifications are implemented as a polling loop against the instance metadata:

https://github.com/buildkite/lifecycled/blob/74e330d38ba66da736591209b38fc218a15a7c8a/spot.go#L46-L80

paging @lox

lox commented 3 years ago

Correct, at this stage we only use lifecycled for gracefully handling spot termination notices. We can ditch all the sqs/sns lifecycle event stuff (I actually thought I had already).

chloeruka commented 3 years ago

Closing this as we don't have SQS resources anymore. #135 #829