Closed haugenj closed 4 years ago
Hi @haugenj
Thanks for your PR!
Code looks good to me. Happy with escalator adding tags, makes sense.
However, would we be able to have this behind a feature gate (scoped under AWS), in order to preserve backwards compatibility? Particularly as this adds a new API call with a new policy rule, preferably defaulted to False
for now? This would ideally be accompanied with a note in the configuration docs.
Thanks for taking a look at this so quickly!
I don't believe the tags will break deployments if the IAM policy isn't updated; the error from the request is caught and logged but it doesn't force an early return or stop the nodegroup from being added. I've tested this on an EKS cluster with both setDesiredCapacity and createFleet scaling strategies and can confirm in both cases Escalator works as before. Perhaps logging with a 'warning' level is more appropriate than an 'error' level?
@awprice can you elaborate on the potential Terraform/tag removing issue you mentioned? I only have a passing knowledge of how Terraform works so I've scanned through their docs a little to try and get some context. This page makes me think that the Escalator tags could be removed if Terraform updates the ASG, so there would be some potential back and forth of adding/removing the tags if there's a lot of Escalator deployments mixed with Terraform ASG updates. That doesn't seem like it would cause any real problems, but let me know if I'm misunderstanding this or missing something.
Thanks for taking a look at this so quickly!
I don't believe the tags will break deployments if the IAM policy isn't updated; the error from the request is caught and logged but it doesn't force an early return or stop the nodegroup from being added. I've tested this on an EKS cluster with both setDesiredCapacity and createFleet scaling strategies and can confirm in both cases Escalator works as before. Perhaps logging with a 'warning' level is more appropriate than an 'error' level?
@awprice can you elaborate on the potential Terraform/tag removing issue you mentioned? I only have a passing knowledge of how Terraform works so I've scanned through their docs a little to try and get some context. This page makes me think that the Escalator tags could be removed if Terraform updates the ASG, so there would be some potential back and forth of adding/removing the tags if there's a lot of Escalator deployments mixed with Terraform ASG updates. That doesn't seem like it would cause any real problems, but let me know if I'm misunderstanding this or missing something.
You are right - this shouldn't break deployments if the IAM policy isn't updated as the error will be logged but not returned.
The Terraform issue is more of a concern, as I can see the addition of the tag causing issues with ASGs managed by Terraform. We (Atlassian) do exactly this and will result in a tags being added/removed quite frequently from our own ASGs.
One solution to this is to place the following Terraform config on the ASG resource, but to me fixing it in Terraform is not the right way to go about it.
lifecycle {
ignore_tags = [tags]
}
Is there another reason why this must be enabled by default and not easily disabled?
Gotcha. They definitely don't need to be enabled by default if there are real problems, I just personally lean towards less user configuration whenever possible and was hoping it would work in this case. I'll work on a revision with the requested changes, thanks!
Tags are pieces of metadata customers can use to organize resources. They're particularly helpful to customers for breaking down costs, so they can see how much they're spending per tag. Functionally, they don't affect the performance of a resource in any way.
This change checks the ASGs when the nodegroups are created to see if the escalator tag is present on the ASG. If the tag isn't present, a request is made to add the tag. The tag will propagate to any instances launched within the ASG, so instances created with either setDesiredCapacity or CreateFleet are tagged.
Additionally, when CreateFleet is used the request itself is tagged.
The tag names are arbitrary, but I had them follow the same style as the ones cluster autoscaler uses