eksctl-io / eksctl

The official CLI for Amazon EKS
https://eksctl.io
Other
4.9k stars 1.4k forks source link

[Bug]: Cluster removal fails when addons are installed #6287

Open mbevc1 opened 1 year ago

mbevc1 commented 1 year ago

What were you trying to accomplish?

Deleting EKS cluster provisioned with addons.

What happened?

When deleting the cluster first step is cordoning the nodes. EBS addon is running controller replica and is blocking cordoning of the nodes. We should probably be removing the addons first to ensure only workloads are running before cordoning.

How to reproduce it?

Provision EKS cluster with a addon (EBS). Then delete is using eksctl delete cluster -f config.yaml

Logs

2023-02-13 14:36:56 [ℹ]  deleting EKS cluster "mb"
2023-02-13 14:36:57 [ℹ]  will drain 0 unmanaged nodegroup(s) in cluster "mb"
2023-02-13 14:36:57 [ℹ]  starting parallel draining, max in-flight of 1
2023-02-13 14:36:58 [ℹ]  cordon node "ip-192-168-29-32.eu-west-1.compute.internal"
2023-02-13 14:36:58 [ℹ]  cordon node "ip-192-168-46-245.eu-west-1.compute.internal"
2023-02-13 14:36:58 [ℹ]  cordon node "ip-192-168-92-221.eu-west-1.compute.internal"
2023-02-13 14:38:12 [!]  1 pods are unevictable from node ip-192-168-29-32.eu-west-1.compute.internal
2023-02-13 14:39:14 [!]  1 pods are unevictable from node ip-192-168-29-32.eu-west-1.compute.internal
2023-02-13 14:40:16 [!]  1 pods are unevictable from node ip-192-168-29-32.eu-west-1.compute.internal
2023-02-13 14:41:18 [!]  1 pods are unevictable from node ip-192-168-29-32.eu-west-1.compute.internal
2023-02-13 14:42:20 [!]  1 pods are unevictable from node ip-192-168-29-32.eu-west-1.compute.internal

Anything else we need to know?

Versions

$ eksctl info

eksctl version: 0.129.0 kubectl version: v1.26.1 OS: linux

TiberiuGC commented 1 year ago

Hi @mbevc1 ! As per our documentation on how to delete clusters here, Pod Disruption Budget policies are preventing the EBS addon from being properly removed. You should run your command with --disable-nodegroup-eviction flag. i.e.

eksctl delete cluster -f cluster.yaml --disable-nodegroup-eviction

mbevc1 commented 1 year ago

Thanks @TiberiuGC for pointing ou this doc note. I guess removing the addon first or in parallel could prevent this issue?

TiberiuGC commented 1 year ago

I think you're right, removing the addon first would probably work as-well.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

mbevc1 commented 1 year ago

Any more decisions on this?

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

mbevc1 commented 1 year ago

We should keep this alive ;)

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

mbevc1 commented 1 year ago

keep-alive :)

dabcoder commented 1 year ago

I think you're right, removing the addon first would probably work as-well.

Just running into this issue as we speak, so I'd suggest adding a note about removing the add-on on this page: https://eksctl.io/usage/creating-and-managing-clusters/

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

mbevc1 commented 11 months ago

keep-alive again :)

AmitBenAmi commented 10 months ago

@mbevc1 have you tried to run with --disable-nodegroup-eviction?

mbevc1 commented 10 months ago

I haven't tried that one, but looks like it could work. But I reckon we still want to have a graceful and build-in support to deprovision add-ons

EmpeRoar commented 9 months ago

Hi @mbevc1 ! As per our documentation on how to delete clusters here, Pod Disruption Budget policies are preventing the EBS addon from being properly removed. You should run your command with --disable-nodegroup-eviction flag. i.e.

eksctl delete cluster -f cluster.yaml --disable-nodegroup-eviction

this did not work for me. im still having the issue

2023-12-17 04:59:01 [ℹ] deleting EKS cluster "jx-angular-clusterx" 2023-12-17 04:59:04 [ℹ] will drain 2 unmanaged nodegroup(s) in cluster "jx-angular-clusterx" 2023-12-17 04:59:04 [ℹ] starting parallel draining, max in-flight of 1 2023-12-17 04:59:08 [✔] drained all nodes: [ip-172-31-13-6.ec2.internal ip-172-31-38-140.ec2.internal] 2023-12-17 05:00:08 [!] 1 pods are unevictable from node ip-172-31-47-9.ec2.internal 2023-12-17 05:01:09 [!] 1 pods are unevictable from node ip-172-31-47-9.ec2.internal

github-actions[bot] commented 8 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

mbevc1 commented 8 months ago

keep-alive

m0un10 commented 8 months ago

It's probably worth calling out that some addons get added by default for new clusters. So, I don't think there are actually any cases now where the delete will actually work. At least --disable-nodegroup-eviction unblocks though! :)

yuxiang-zhang commented 8 months ago

It's probably worth calling out that some addons get added by default for new clusters.

Indeed, the coredns addon now has a default Pod Disruption Budget (PDB), preventing the pod to be evicted when draining the nodegroups. As a workaround for now, we can use eksctl delete cluster --disable-nodegroup-eviction or eksctl delete nodegroup --disable-eviction.