Closed tahoward closed 5 years ago
Hi @tahoward thanks for opening an issue.
Can you quantify generous termination grace period
? Just to have an idea.
Been using 60 seconds (default 30). Don't think the grace period matters as long as the time the pod containers take to handle sigterm does not exceed its value. Looking at pod logs a sigterm does initially get sent when an eviction is created via API. However, the pod is killed well before the sigterm handler completes. This does not happen when using delete pod API endpoint.
This appears to be a bug in the eviction API where if you don't provide delete_options
you get the behaviour you're seeing. It looks like the fix has only made it into the 1.14
branch. There is a workaround detailed in the issue where it was reported. Can you change the code in the k8s_util.py
file to this and see if it fixes the problem:
body = {
'apiVersion': 'policy/v1beta1',
'kind': 'Eviction',
'delete_options': {},
'metadata': {
'name': pod.metadata.name,
'namespace': pod.metadata.namespace
}
}
api.create_namespaced_pod_eviction(pod.metadata.name + '-eviction', pod.metadata.namespace, body)
I suspect we may end up have to add some code to poll the pods to make sure that they have been deleted but let's try this first.
From my own testing, this didn't seem to work, interested to see what you find.
I replicated your results. ATM I'm using the delete API with a 1 second poller excluding DaemonSet kind
pods.
Also, IMO it'd be preferable to cancel an ASG node termination if a pod disruption budget is going to be violated. ATM even if the eviction endpoint worked as intended the pods under the violated disruption budget would be forcefully removed on node termination.
I think #10 might fix the issues you've encountered here @tahoward .
I'm going to close this but feel free to reopen if it doesn't.
Using k8s 1.11 the evict API endpoint appears to ignore pods that handle SIGTERM gracefully with a generous termination grace period set. Havn't tested other k8s versions yet.
Have had better luck with the delete pod API endpoint and polling pods till they are all terminated. Although this does not respect any pod disruption budgets.