Open t3mi opened 3 years ago
Hi t3mi, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.
I might be just a bot, but I'm told my suggestions are normally quite good, as such: 1) If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster. 2) Please abide by the AKS repo Guidelines and Code of Conduct. 3) If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics? 4) Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS. 5) Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue. 6) If you have a question, do take a look at our AKS FAQ. We place the most common ones there!
Triage required from @Azure/aks-pm
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
@paulgmiller can your team assist?
Triage required from @Azure/aks-pm
Action required from @Azure/aks-pm
@marwanad can you assist?
@t3mi @miwithro @marwanad Sorry I missed this. We are considering this option for a specific node pool (I jokingly refering to it as the "--yolo" flag). We're a little concerned that we're giving people a gun to shoot themselves in the foot. Is there a reason you prefer a flag over removing/modifying the PodDisruptionBuget in question? Worry that disable eviction ignoreds every deployment while with a poddisriptionbudget you can be more selective about important services.
Triage required from @Azure/aks-pm
@paulgmiller OP needs it in this case for destroying a cluster with terraform - I think. Hence the ticket reference. At least in our case we want to replace one nodepool through another with terraform.
Action required from @Azure/aks-pm
@paulgmiller we are using terraform during deployment and in our CI so during teardown of clusters with nodepools we're having such errors. Instead of adding additional step to gather/remove all PodDisruptionBuget objects and slow CI tests we would prefer for additional flag to be present to force remove nodepools without caring what's inside.
Issue needing attention of @Azure/aks-leads
We opened also an issue at Microsoft Azure support TrackingID #2103170050002008. Unfortunately while is was on a leave it was closed as "not-an-issue". This feature is implemented by every big cloud provider. I am currently reopening the issue.
We will consider this feature for nodepool delete to start with.
aks nodepool delete --force
We have done some initial ideation on this feature and will be working to release it in the coming months.
I had to remember to remove PodDisruptionBuget with ALLOWED DISRUPTIONS = 0 before destroy the node group This feature requested would be very helpful
@palma @qpetraroia @justindavies any ETA on when aks nodepool delete --force
will be ready...? :)
@kaarthis can you look into this one?
Yes part of discussion on the Nodepool API. i can look into this and report back.
Any news on this one? This issue becomes blocking for us. We want to be able to destroy our dev clusters whatever they state are.
The documentation states that no drains are done on nodepool removal, however, it seems that the eviction controller is called in some way. cf https://learn.microsoft.com/en-us/azure/aks/resize-node-pool?tabs=azure-cli#remove-the-existing-node-pool
Any progress @kaarthis @palma @qpetraroia @justindavies @alvinli222?
We're also running into this via Terraform, I think an option to force delete makes sense to me
Code="KubernetesAPICallFailed" Message="Drain node akswin00000q failed when evicting pod rendering-deployment-84cbc76c47-kccm9. Eviction failed with Too many Requests error. This is often caused by a restrictive Pod Disruption Budget (PDB) policy. See http://aka.ms/aks/debugdrainfailures. Original error: API call to Kubernetes API Server failed."
In our particular case, we're trying to delete the whole cluster but Terraform is trying to delete the individual node pools first. We're working around this by just deleting the entire cluster which appears to avoid this problem but it is still something I'd love to see improved.
+1
Hi everyone, this feature we pushed into Public Preview awhile back but there was some bugs that were found. We are actively working on this now and hope to re-release to Public Preview ASAP with an API property flag and also corresponding CLI command. I have communicated this update with the Terraform team as well.
Hi! Any update on this? Without the ability to force drain nodes, the existing feature of being able to start/stop nodepools is not very useful.
What happened: In case of small sized node pool or tolerations on it which prevents re-scheduling of the pods configured with pod disruption budget from bigger node pool following error occurs during node pool removal:
What you expected to happen: Node pool successfully removed.
How to reproduce it (as minimally and precisely as possible): Deploy cluster with additional node pool and taints for it. Deploy application with pod disruption budget set inside the nodepool and try to remove the node pool.
Anything else we need to know?: For the reference, there is a flag
disable-eviction
for kubectl to force drain a node.Environment:
kubectl version
):