[Bug] Cluster delete hangs with aws-ebs-csi-driver addon (unable to drain node)

eti-tme-tim commented 1 year ago

What were you trying to accomplish?

Delete the cluster created by a YAML file that includes the AWS EBS CSI driver addon.

What happened?

Cluster creation works without error. PVC and PV creation via EBS CSI driver on this cluster work without error. The following pods are running when it is time to delete the cluster:

kube-system   aws-node-5k294                        1/1     Running   0             14m
kube-system   aws-node-smbx4                        1/1     Running   0             14m
kube-system   aws-node-zzrzj                        1/1     Running   1 (12m ago)   14m
kube-system   coredns-5948f55769-sd54s              1/1     Running   0             24m
kube-system   coredns-5948f55769-v9rzn              1/1     Running   0             24m
kube-system   ebs-csi-controller-664869bc4d-jmxdm   6/6     Running   0             11m
kube-system   ebs-csi-controller-664869bc4d-k8qmb   6/6     Running   0             11m
kube-system   ebs-csi-node-g92pp                    3/3     Running   0             11m
kube-system   ebs-csi-node-gh5gr                    3/3     Running   0             11m
kube-system   ebs-csi-node-k6ldn                    3/3     Running   0             11m
kube-system   kube-proxy-2qgt2                      1/1     Running   0             14m
kube-system   kube-proxy-427v2                      1/1     Running   0             14m
kube-system   kube-proxy-rvdhv                      1/1     Running   0             14m

When you issue the deletion command, the workflow hangs because the compute nodes can not be drained successfully:

2022-11-01 12:05:29 [ℹ]  deleting EKS cluster "cluster1"
2022-11-01 12:05:30 [ℹ]  will drain 1 unmanaged nodegroup(s) in cluster "cluster1"
2022-11-01 12:05:30 [ℹ]  starting parallel draining, max in-flight of 1
2022-11-01 12:05:30 [ℹ]  cordon node "ip-192-168-17-134.us-east-2.compute.internal"
2022-11-01 12:05:30 [ℹ]  cordon node "ip-192-168-37-90.us-east-2.compute.internal"
2022-11-01 12:05:31 [ℹ]  cordon node "ip-192-168-79-66.us-east-2.compute.internal"
2022-11-01 12:06:45 [!]  1 pods are unevictable from node ip-192-168-17-134.us-east-2.compute.internal
2022-11-01 12:07:48 [!]  1 pods are unevictable from node ip-192-168-17-134.us-east-2.compute.internal
2022-11-01 12:08:51 [!]  1 pods are unevictable from node ip-192-168-17-134.us-east-2.compute.internal
2022-11-01 12:09:53 [!]  1 pods are unevictable from node ip-192-168-17-134.us-east-2.compute.internal
2022-11-01 12:10:56 [!]  1 pods are unevictable from node ip-192-168-17-134.us-east-2.compute.internal
2022-11-01 12:11:59 [!]  1 pods are unevictable from node ip-192-168-17-134.us-east-2.compute.internal
...

The list of pods while the draining node hangs (and eventually times out) are:

NAMESPACE     NAME                                  READY   STATUS    RESTARTS        AGE
kube-system   aws-node-5k294                        1/1     Running   0               3h46m
kube-system   aws-node-smbx4                        1/1     Running   0               3h46m
kube-system   aws-node-zzrzj                        1/1     Running   1 (3h44m ago)   3h46m
kube-system   coredns-5948f55769-sd54s              1/1     Running   0               3h57m
kube-system   coredns-5948f55769-v9rzn              1/1     Running   0               3h57m
kube-system   ebs-csi-controller-664869bc4d-k8qmb   6/6     Running   3 (121m ago)    3h43m
kube-system   ebs-csi-controller-664869bc4d-vhz7q   0/6     Pending   0               3h32m
kube-system   ebs-csi-node-g92pp                    3/3     Running   0               3h44m
kube-system   ebs-csi-node-gh5gr                    3/3     Running   0               3h44m
kube-system   ebs-csi-node-k6ldn                    3/3     Running   0               3h44m
kube-system   kube-proxy-2qgt2                      1/1     Running   0               3h46m
kube-system   kube-proxy-427v2                      1/1     Running   0               3h46m
kube-system   kube-proxy-rvdhv                      1/1     Running   0               3h46m

I've tracked the issue down to the fact that the ebs-csi-controller deployment is not getting deleted properly. As soon as I manually delete it via:

kubectl delete deployment -n kube-system ebs-csi-controller

The workflow proceeds just fine and the cluster deletion completes successfully. I've also found that a timed out, failed eksctl delete command can essentially be re-run after the ebs-csi-controller deployment is deleted without issue.

How to reproduce it?

Commands:

eksctl create cluster -f cluster1.yaml
# Completes successfully.  Can deploy workloads if I want.  Or not.  Doesn't affect outcome.
sleep 600
# Just to give the cluster time to settle.
eksctl delete cluster -f cluster1.yaml
# Times out waiting for nodes to drain

cluster1.yaml:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: cluster1
  region: us-east-2
  version: "1.23"

iam:
  withOIDC: true
  serviceAccounts:
  - metadata:
      name: ebs-csi-controller-sa
      namespace: kube-system
    attachPolicyARNs:
    - "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
    roleOnly: true
    roleName: AmazonEKS_EBS_CSI_DriverRole

addons:
- name: aws-ebs-csi-driver
  version: v1.11.4-eksbuild.1
  serviceAccountRoleARN: arn:aws:iam::ACCOUNTID:role/AmazonEKS_EBS_CSI_DriverRole

nodeGroups:
  - name: pool1
    instanceType: m5a.2xlarge
    desiredCapacity: 3

Anything else we need to know?

All client side binaries and software installed via homebrew. And, I'm really hoping I doing something stupidly and simply wrong. :)

Note: with the EKS 1.23 changes related to EBS CSI (namely, needing the service role), this is the first time we've had to use an addon. We've not had cluster deletion issues with EKS 1.22 (and not explicitly declaring the addon and IAM role). Kinda seems like the addon should be deleted from the node pool before the nodes are drained (although I could envision other problems with that approach if those nodes weren't evacuated before starting the drain as in my case).

Versions

$ eksctl info
eksctl version: 0.116.0-dev+9db3d29ea.2022-10-28T12:53:00Z
kubectl version: v1.23.6
OS: darwin

github-actions[bot] commented 1 year ago

Hello eti-tme-tim :wave: Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-5 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

TiberiuGC commented 1 year ago

I have an older cluster with an nginx deployment for which deletion blocks with the same state. @eti-tme-tim thanks for finding out the workaround.

eti-tme-tim commented 1 year ago

@TiberiuGC Any traction on this issue?

TiberiuGC commented 1 year ago

Hi @eti-tme-tim . I'm afraid I have no updates for now. We have a reduced capacity in this period and needed to prioritise other work.

TiberiuGC commented 1 year ago

Hi @eti-tme-tim - It seems that this addon comes with a default Pod Disruption Budget policy, configured as follows

eksctl % kubectl get pdb -A                                      
NAMESPACE     NAME                 MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
kube-system   ebs-csi-controller   N/A             1                 1                     11m

And since there are two aws-ebs-csi-driver pods and this rule allows only 1 to be unavailable at a time, the other ends up with the error we’re seeing (i.e. 1 pods are unevictable from node …). The solution is to use the --disable-nodegroup-eviction flag while deleting the cluster. This will bypass checking Pod Disruption Budget policies.

I've opened a PR to document the behaviour.

eksctl-io / eksctl