Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.93k stars 293 forks source link

[Feature] Expose `--startup-taints` (`--ignore-taints`) option in autoscaler profiles #3276

Open hterik opened 1 year ago

hterik commented 1 year ago

Is your feature request related to a problem? Please describe. Need to start nodes with custom taints, to allow required DaemonSets to start before scheduling any other pods onto the node. When doing so today however, having the taint on the NodePool will exclude the pool from upscale since the template thinks the Pod will never be able to run on the Node due to the taints, even if eventually can, once the DaemonSets have initialized the node.

Describe the solution you'd like Kubernetes Cluster autoscaler has an option called --ignore-taints to enable above use-case. It would be good if it was exposed in the AKS autoscaler profile. https://learn.microsoft.com/en-us/azure-stack/aks-hci/work-with-autoscaler-profiles

Describe alternatives you've considered As workaround, there is also a annotation-prefix one can use. ignore-taint.cluster-autoscaler.kubernetes.io/

carvido1 commented 1 year ago

Hello @hterik

To understand better your request.

I'm not understanding the purpose of what you want to do. If you require to schedule pods to a nodepool after a DaemonSet pod has started maybe you can use an init-container that does a curl to a health check to the DaemonSet pod.

Thanks in advance

hterik commented 1 year ago

Yes, the DaemonSet that fully initializes the node will require toleration for the taint. Other pods should not have the toleration. If initializing the DaemonSet takes very long, it may be better to schedule the pod on an old node, if such resources become available first. Otherwise the pod will be scheduled on the new node and wait very long for the DaemonSet to start up completely. In our case it's not just starting the DaemonSet, but also downloading and baking a huge dataset into a hostPath that worker pods use, it can take 10-60 minutes. You can see https://github.com/kubernetes/autoscaler/issues/5251 for a more elaborate description.

ghost commented 1 year ago

Action required from @Azure/aks-pm

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 11 months ago

Issue needing attention of @Azure/aks-leads

artificial-aidan commented 11 months ago

Any progress on this? Because AKS forcibly taints their spot nodes, being able to ignore taints when scaling up would be nice.

microsoft-github-policy-service[bot] commented 5 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 4 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 4 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 3 months ago

Issue needing attention of @Azure/aks-leads

hterik commented 3 months ago

ignore-taints has been renamed to startup-taints in upstream cluster-autoscaler. https://github.com/kubernetes/autoscaler/pull/6132 https://github.com/kubernetes/autoscaler/pull/6218 The need for exposing this option in AKS remains.

microsoft-github-policy-service[bot] commented 2 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 2 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 1 month ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 1 month ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 4 weeks ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 1 week ago

Issue needing attention of @Azure/aks-leads