Open fincd-aws opened 4 years ago
There should be a k8s action when the cloud controller deregisters the node from all of the ELBs that we can cause or wait for, before completing the lifecycle hook.
I'll be honest I don't really know what any of that means (I've not used k8s since I published this tool), can you elaborate?
The less sophisticated alternative would be accepting a maximum draining wait value that the operator would have to keep updated beyond the ELB default of 300 seconds.
Do we mean some sort of environment variable that the user can configure when installing which tells it to wait the specified amount of time?
Yes, some user-configurable wait between remove_all_pods() and asg.complete_lifecycle_action().
ELB default draining time is normally 300 seconds, so we can default to that.
I guess this is caused by https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/1719, so not a particular problem of the node drainer itself.
As pointed out by the comment below from the kubernetes/autoscaler repo, this and other lifecycle hooks are very naive in assuming that the instance can be terminated after checking the node's non-DaemonSet pods are gone. kube-proxy may still be proxying connections to other nodes.
There should be a k8s action when the cloud controller deregisters the node from all of the ELBs that we can cause or wait for, before completing the lifecycle hook. The less sophisticated alternative would be accepting a maximum draining wait value that the operator would have to keep updated beyond the ELB default of 300 seconds.
Originally posted by @bshelton229 in https://github.com/kubernetes/autoscaler/issues/1907#issuecomment-561945989