Closed jhuntwork closed 4 years ago
Yes, according to the ASG lifecycle hooks, this should help.
When Amazon EC2 Auto Scaling responds to a scale-in event, it terminates one or more instances. These instances are detached from the Auto Scaling group and enter the Terminating state. If you added an autoscaling:EC2_INSTANCE_TERMINATING lifecycle hook to your Auto Scaling group, the instances move from the Terminating state to the Terminating:Wait state. After you complete the lifecycle action, the instances enter the Terminating:Proceed state. When the instances are fully terminated, they enter the Terminated state.
I like this idea, thank you.
... and continue with other operations once it reaches
Terminating:wait
state.As per https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-conn-drain.html
ASGs will beging draining nodes from load balancers while in
Terminating
state, and will even remove them from the load balancer before the node transitions toTerminating:wait
. This means if you depend on pod evictions to move a critical service to another available node in the load balancer target group, this will only happen after that node has already been drained and removed from the load balancer.This effect is amplified whenever there is a timeout, or Deregistration delay value set on the load balancer. The default value is 300 seconds, as per here: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#deregistration-delay
By draining sooner, critical pods providing service through that load balancer can move to other nodes and maintain uptime while the node is being deregistered from the load balancer.