VirtusLab / kubedrainer

Simple Kubernetes Node Drainer
Apache License 2.0
20 stars 10 forks source link

Begin draining a node when it enters `Terminating` state #5

Closed jhuntwork closed 4 years ago

jhuntwork commented 4 years ago

... and continue with other operations once it reaches Terminating:wait state.

As per https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/config-conn-drain.html

If your instances are part of an Auto Scaling group and connection draining is enabled for your load balancer, Auto Scaling waits for the in-flight requests to complete, or for the maximum timeout to expire, before terminating instances due to a scaling event or health check replacement

ASGs will beging draining nodes from load balancers while in Terminating state, and will even remove them from the load balancer before the node transitions to Terminating:wait. This means if you depend on pod evictions to move a critical service to another available node in the load balancer target group, this will only happen after that node has already been drained and removed from the load balancer.

This effect is amplified whenever there is a timeout, or Deregistration delay value set on the load balancer. The default value is 300 seconds, as per here: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-target-groups.html#deregistration-delay

By draining sooner, critical pods providing service through that load balancer can move to other nodes and maintain uptime while the node is being deregistered from the load balancer.

pawelprazak commented 4 years ago

Yes, according to the ASG lifecycle hooks, this should help.

https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroupLifecycle.html#as-lifecycle-hooks

When Amazon EC2 Auto Scaling responds to a scale-in event, it terminates one or more instances. These instances are detached from the Auto Scaling group and enter the Terminating state. If you added an autoscaling:EC2_INSTANCE_TERMINATING lifecycle hook to your Auto Scaling group, the instances move from the Terminating state to the Terminating:Wait state. After you complete the lifecycle action, the instances enter the Terminating:Proceed state. When the instances are fully terminated, they enter the Terminated state.

I like this idea, thank you.