aws-samples / amazon-k8s-node-drainer

Gracefully drain Kubernetes pods from EKS worker nodes during autoscaling scale-in events.
Other
199 stars 57 forks source link

Signal Returned to ASG before pods terminate gracefully. #9

Closed tahoward closed 5 years ago

tahoward commented 5 years ago

Using k8s 1.11 the evict API endpoint appears to ignore pods that handle SIGTERM gracefully with a generous termination grace period set. Havn't tested other k8s versions yet.

Have had better luck with the delete pod API endpoint and polling pods till they are all terminated. Although this does not respect any pod disruption budgets.

matteofigus commented 5 years ago

Hi @tahoward thanks for opening an issue.

Can you quantify generous termination grace period? Just to have an idea.

tahoward commented 5 years ago

Been using 60 seconds (default 30). Don't think the grace period matters as long as the time the pod containers take to handle sigterm does not exceed its value. Looking at pod logs a sigterm does initially get sent when an eviction is created via API. However, the pod is killed well before the sigterm handler completes. This does not happen when using delete pod API endpoint.

svozza commented 5 years ago

This appears to be a bug in the eviction API where if you don't provide delete_options you get the behaviour you're seeing. It looks like the fix has only made it into the 1.14 branch. There is a workaround detailed in the issue where it was reported. Can you change the code in the k8s_util.py file to this and see if it fixes the problem:

body = {
            'apiVersion': 'policy/v1beta1',
            'kind': 'Eviction',
            'delete_options': {},
            'metadata': {
                'name': pod.metadata.name,
                'namespace': pod.metadata.namespace
            }
        }
api.create_namespaced_pod_eviction(pod.metadata.name + '-eviction', pod.metadata.namespace, body)

I suspect we may end up have to add some code to poll the pods to make sure that they have been deleted but let's try this first.

svozza commented 5 years ago

From my own testing, this didn't seem to work, interested to see what you find.

tahoward commented 5 years ago

I replicated your results. ATM I'm using the delete API with a 1 second poller excluding DaemonSet kind pods.

Also, IMO it'd be preferable to cancel an ASG node termination if a pod disruption budget is going to be violated. ATM even if the eviction endpoint worked as intended the pods under the violated disruption budget would be forcefully removed on node termination.

svozza commented 5 years ago

I think #10 might fix the issues you've encountered here @tahoward .

svozza commented 5 years ago

I'm going to close this but feel free to reopen if it doesn't.