aws-samples / amazon-k8s-node-drainer

Gracefully drain Kubernetes pods from EKS worker nodes during autoscaling scale-in events.
Other
199 stars 57 forks source link

Issue with lambda- errors while evicting #27

Closed marcincuber closed 4 years ago

marcincuber commented 4 years ago

I am getting following error. Any ideas how to fix it? Currently, lambda can't evict any pods.

[ERROR] 2020-05-07T11:29:40.860Z 3caa9c4f-db5d-4e2c-95eb-e15d59d51795 Unexpected error adding eviction for pod flux/memcached-65cbddbbdb-b8bxg

  | 2020-05-07T12:29:40.860+01:00 | Traceback (most recent call last):
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/k8s_utils.py", line 87, in evict_pods
  | 2020-05-07T12:29:40.860+01:00 | pod.metadata.name + '-eviction', pod.metadata.namespace, body)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/apis/core_v1_api.py", line 6353, in create_namespaced_pod_eviction
  | 2020-05-07T12:29:40.860+01:00 | (data) = self.create_namespaced_pod_eviction_with_http_info(name, namespace, body, **kwargs)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/apis/core_v1_api.py", line 6450, in create_namespaced_pod_eviction_with_http_info
  | 2020-05-07T12:29:40.860+01:00 | collection_formats=collection_formats)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/api_client.py", line 334, in call_api
  | 2020-05-07T12:29:40.860+01:00 | _return_http_data_only, collection_formats, _preload_content, _request_timeout)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/api_client.py", line 168, in __call_api
  | 2020-05-07T12:29:40.860+01:00 | _request_timeout=_request_timeout)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/api_client.py", line 377, in request
  | 2020-05-07T12:29:40.860+01:00 | body=body)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/rest.py", line 266, in POST
  | 2020-05-07T12:29:40.860+01:00 | body=body)
  | 2020-05-07T12:29:40.860+01:00 | File "/var/task/kubernetes/client/rest.py", line 222, in request
  | 2020-05-07T12:29:40.860+01:00 | raise ApiException(http_resp=r)
  | 2020-05-07T12:29:40.860+01:00 | kubernetes.client.rest.ApiException: (400)
  | 2020-05-07T12:29:40.860+01:00 | Reason: Bad Request
marcincuber commented 4 years ago

This is an issue where lambda doesn't support Kubernetes 1.16+.

svozza commented 4 years ago

I was just about to ask, what version you were using. I'll look into what's required to support the new version but I think this might be a better solution the problem now:

https://github.com/aws/aws-node-termination-handler

marcincuber commented 4 years ago

Hi, I was using the latest code in this repo.

With regards to aws-node-termination-handler, I am already using for spot interruption. However, feature that I need only exists in feature request https://github.com/aws/aws-node-termination-handler/issues/141.

Basically with the current code the issue is that api.create_namespaced_pod_eviction(pod.metadata.name + '-eviction', pod.metadata.namespace, body) doesn't work for EKS 1.16. It should simply be api.create_namespaced_pod_eviction(pod.metadata.name, pod.metadata.namespace, body)

svozza commented 4 years ago

Oh that's interesting, I didn't realise it had that constraint. Thank you for bringing it to my attention.

marcincuber commented 4 years ago

@svozza if you have any influence on the priorities for feature requests around aws-node-termination-handler then the above mentioned one is an absolute must for me.

mitom commented 4 years ago

Has there been a resolution to this or otherwise could we re-open the issue? Currently this lambda handles events that the NTH doesn't and it seems their current plan failed and is delayed so this is still our best bet.

Would it be OK to just release the current version as 0.1.0, make the change @marcincuber suggested and release as 0.2.0 with a README clarifying it (sorry, I'm unclear whether this is breaking from k8s 1.15 or not)?

marcincuber commented 4 years ago

@mitom it is not a breaking change. Function just requires to check what version of kubernetes is used and then make an appropriate call based on that.

Reopening an issue

svozza commented 4 years ago

I'd be happy to accept a pull request if it's just a small change.

ctd commented 4 years ago

I believe this is fixed in https://github.com/aws-samples/amazon-k8s-node-drainer/pull/30 so closing this issue.