Closed lovettchris closed 6 days ago
Solution 1: Pod anti-affinity
"In the following example Deployment for the Redis cache, the replicas get the label app=store. The podAntiAffinity rule tells the scheduler to avoid placing multiple replicas with the app=store label on a single node. This creates each cache in a separate node."
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#more-practical-use-cases
Solution 2: Topology spread constraints
Cool thanks, I will give these a try and see what happens.
Pod anti-affinity is working perfectly, this will save me a lot of compute time thrashing machines, thanks so much!
Action required from @Azure/aks-pm
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Issue needing attention of @Azure/aks-leads
Is your feature request related to a problem? Please describe.
I have a job that slams the pod SKU I have chosen, so I know up front that I never want more than one of my pods per node when auto-scaling my cluster. Is there a way to configure the HorizontalPodAutoscaler to do this? The
targetCPUUtilizationPercentage
with a low percentage like 20% doesn't seem to work, it still tries to put 2 pods on that node which is too much and one of those pods will crash with out of memory and this takes a lot of valuable processing time away from the first pod.Describe the solution you'd like
I want to tell AKS to never put more than one of my pods per node. How can I do that? Is ther another type of auto-scaler I should be using, can I do this with a custom metric? Any samples available?
Describe alternatives you've considered
I've considered manually running
az aks scale --resource-group myResourceGroup --name myAKSCluster --node-count 20
to force the creation of 20 nodes, but how can i be sure AKS will utilize all 20 nodes before doubling up pods on a single node? Plus I'd prefer to have auto-scaling.Additional context
The job is a data driven neural network quantization which is a heavy duty process that takes about 10 minutes to complete per model, I want my cluster to scale horizontally so I can process about 20 models in parallel using 20 nodes, so I can do all of them in 10 minutes, but the system today takes much longer because of all this thrashing.