Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.95k stars 305 forks source link

[BUG] No ability to provide tolerations for overlay-vpa-webhook-generation jobs #4501

Open NielsMoorenAH opened 1 month ago

NielsMoorenAH commented 1 month ago

Describe the bug When running AKS VPA with one system node pool the overlay-vpa-webhook-generation jobs will stay unscheduled as there is no ability to add tolerations if you taint your system node pool.

To Reproduce Steps to reproduce the behavior:

  1. Add a taint to your system node pool.
  2. Enable VPA for your cluster
  3. Roll out VPA
  4. Watch the overlay-vpa-cert-webhook-cleanup job stay unscheduled because it does not have the right tolerations.

Expected behavior It should be possible to add a toleration to such jobs to make sure they run on a tainted cluster.

Screenshots image

Environment (please complete the following information):

xiazhan commented 3 weeks ago

@NielsMoorenAH The system node pool shouldn't be updated with custom taints. May I know how you added the taints to your nodepool?

image

NielsMoorenAH commented 3 weeks ago

Hi, as you can see in the picture I only added the CriticalAddonsOnly=true:NoSchedule taint to the system node pool, as suggested by the documentation. This is to stop user apps to deploy on the system nodes, isolating critical system pods from our application pods.