Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.95k stars 305 forks source link

Extension manager cannot be installed if all node pools in cluster are tainted #3359

Closed jan-delaet closed 8 months ago

jan-delaet commented 1 year ago

What happened:

Enabling the GitOps (FluxV2) extension on an AKS cluster with tainted node pools does not work. When the extension is being provisioned (either through Azure CLI or Bicep template, have tried both) you can see that 2 prerequisite deployments, named extension-operator and extension-agent fail to be scheduled on any node due to no toleration being in place for the aforementioned taints. We are tainting the system node pool with the CriticalAddonsOnly=true:NoSchedule taint based on best practices and need to segregate critical system components from actual workloads.

What you expected to happen:

I expect the AKS extension manager to be able to be scheduled on tainted nodes, either out of the box by tolerating the CriticalAddonsOnly=true:NoSchedule taint or by allowing the user to configure the tolerations themselves.

How to reproduce it (as minimally and precisely as possible):

az k8s-extension create \
--resource-group rg-aks-dev-weu-01 \
--cluster-name aks-dev-weu-01 \
--name flux \
--extension-type microsoft.flux \
--cluster-type managedClusters \
--auto-upgrade-minor-version true \
--config toleration-keys="CriticalAddonsOnly=true:NoSchedule"
kubectl get po -n kube-system -l app.kubernetes.io/name=extension-manager

NAME                                 READY   STATUS    RESTARTS   AGE
extension-agent-85f569cdb7-sqqsd     0/2     Pending   0          2d6h
extension-operator-6d867d765-dvhk7   0/2     Pending   0          2d6h
kubectl describe po -n kube-system extension-agent-85f569cdb7-sqqsd

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  18s   default-scheduler  0/2 nodes are available: 2 node(s) had untolerated taint {CriticalAddonsOnly: true}. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.

Anything else we need to know?:

If you have any node pool in your AKS cluster that has no taints on it at all, the extension manager and extension itself will get created successfully, but I don't see how this can be intended behaviour. I would consider all of these components to be part of the cluster infrastructure itself. They should not be running on random worker nodes.

If there is some way to get the extension manager to be scheduled on those tainted nodes, we would then also need to be able to do the same thing for the FluxV2 extension components itself.

Environment:

ghost commented 1 year ago

Hi jan-delaet, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such: 1) If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster. 2) Please abide by the AKS repo Guidelines and Code of Conduct. 3) If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics? 4) Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS. 5) Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue. 6) If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

ghost commented 1 year ago

Triage required from @Azure/aks-pm

ghost commented 1 year ago

Action required from @Azure/aks-pm

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

jan-delaet commented 1 year ago

Do you have any feedback on this?

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

philwelz commented 1 year ago

had the same issue, the system node pool was tainted with CriticalAddonsOnly and a user node pool with 0 nodes and auto-scaling enabled onto 2 nodes.

The extension manager & agent did not even scale up the user node pool....

ghost commented 1 year ago

Action required from @Azure/aks-pm

matthchr commented 1 year ago

I expect the AKS extension manager to be able to be scheduled on tainted nodes, either out of the box by tolerating the CriticalAddonsOnly=true:NoSchedule taint or by allowing the user to configure the tolerations themselves.

You're correct. The Extension Manager pod should have this toleration. That's a bug on our end. @bavneetsingh16 is working on fixing this.

we would then also need to be able to do the same thing for the FluxV2 extension components itself.

I think that FluxV2 has this toleration already, although I actually believe that it shouldn't. If you look at the description of when the CritialAddonsOnly annotation [should be used]():

You can enforce this behavior by creating a dedicated system node pool. Use the CriticalAddonsOnly=true:NoSchedule taint to prevent application pods from being scheduled on system node pools.

IMO, from the perspective of AKS/Kubernetes, Flux is an application pod. It should be scheduled on the application node pool. With that said there are probably cases where the user really does want to ensure that a particular extensions pods tolerate this taint. We don't want to bake that into every extension though as some extensions may be critical for some users while noncritical for others. As you suggested, the ability to configure additional tolerations on the extension at deployment time seems a reasonable ask. @bavneetsingh16 can point you to a good place to file an issue tracking that ask (or we can track it via a separate issue in this repo if we prefer).

Create a vanilla AKS cluster with a single system node pool, no user node pools. Taint this node pool with the CriticalAddonsOnly=true:NoSchedule taint.

I know this was part of the step for minimal reproduction, and you're 100% correct about the issue you raised, but I just want to highlight (for other readers) that a cluster with only a single system node pool with CriticalAddonsOnly=true:NoSchedule is nonsensical. If there's no other node pool to schedule application pods to, the cluster isn't useful. On the other hand if all application pods tolerate CriticalAddonsOnly=true:NoSchedule then there's no reason to set the taint at all. Really this taint only makes sense on clusters with >1 pool.

ghost commented 1 year ago

Action required from @Azure/aks-pm

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

pradeepjsr05 commented 1 year ago

We are adding inbuilt capability to schedule AKS extension manager on CriticalAddOnsOnly tainted nodes. This work is in progress. We will also provide ability to configure additional tolerations on the extension at the deployment time. As of now, we don't have an exact ETA for this enhancement.

In the meanwhile, we suggest below mitigations for the customers to get unblocked:

jan-delaet commented 1 year ago

Hello, any update on this?

ghost commented 1 year ago

Action required from @Azure/aks-pm

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ngbrown commented 1 year ago

I came across this issue as well. I have two node pools, one the system node pool, and another, is a spot node pool. Both node pools end up having taints. The extension-agent and extension-operator deployments are the only ones in the cluster with no nodes being able to schedule.

Even those these are fully controlled by Microsoft, is there a way to patch in the appropriate toleration per cluster? This would also help for excessive resource requirements.

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

pradeepjsr05 commented 1 year ago

A code change has been rolled out to add the toleration for CriticalAddOnsOnly taint for extension platform pods. Let us know if you still face this issue.

ngbrown commented 8 months ago

The change has been working for me.

matthchr commented 8 months ago

Going to go ahead and close this given the immediate fix seems to be working for folks. The Extensions team still has a backlog item to allow users to override the tolerations via the extensions CLI.

kc8421 commented 4 months ago

@matthchr, is there some item in AKS roadmap or somewhere else to track this backlog item? It seems that running Flux on cluster which is having only node pools with spot instances aren't working successfully without ability to override toleration.

matthchr commented 4 months ago

@kc8421, the AKS team does not own the flux Extension. @pradeepjsr05's team does I believe.

He may know a better place to report the issue with Flux not working on spot nodes.

bavneetsingh16 commented 4 months ago

The Flux application is currently unable to operate on spot instances due to the unique taints associated with these instances. You can find more details about these taints on the Microsoft Azure documentation.

At present, there isn't a persistent method to update the tolerations for the Flux extension. However, as a temporary solution, you can manually edit the deployment for the Flux controllers and add the required toleration. Please be aware that if the extension is upgraded, the manually added toleration will be reset, potentially leading to provisioning issues with the controllers.