Closed awprice closed 5 years ago
Are there any downsides or side effects to settings this to false? If so, would having it as a defaulted option in the node group config make sense?
I don't think there are any downsides to setting this to false, as we already provide a safeguard in Escalator with the scale lock. The scale lock works the same way as the cooldown in Autoscaling groups in that it prevents runaway scaling.
Cluster-autoscaler also sets this to false and doesn't have an option to change it - https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/auto_scaling_groups.go#L201
To include it in the node group config would require some extra thought - this is an AWS specific setting, we will need a way to store per-cloudprovider settings in the node group config.
We should set the
HonorCooldown
option tofalse
when setting the desired capacity for the ASG.We've seen cases where an instance is stuck in a Pending state and will block the ASG from being updated. This can occur for 30-40 minutes until the instance is terminated by AWS because it is failing health checks. The instance is usually in this state due to an underlying hardware issue.
Changing this value to
false
will allow Escalator to continue operating even when there are nodes with issues in the ASG.