atlassian / escalator

Escalator is a batch or job optimized horizontal autoscaler for Kubernetes
Apache License 2.0
662 stars 59 forks source link

KUBE-5984 - ensure that scale-ups always occur when there are starved pods #225

Closed triorph closed 1 year ago

triorph commented 1 year ago

This fixes https://github.com/atlassian/escalator/issues/224

The main change here is to add a ScaleOnStarve option to the node group configuration. This true/false value configures an additional check on the nodeDelta calculated during the scaling step.

When we gather the RequestedPod, we also gather the largest pending pods (by both CPU and Memory). When we gather the node capacity, we also gather the largest node (node with allocatable CPU/Memory minus used pod CPU/memory) that is the highest. If either of the requested pods have larger requirements than what's available on the largest capacity, then that indicates we have a "starved pod". In the case that a pod exists with no nodes available (and we have ScaleOnStarve enabled), then we make sure that the scaling algorithm has at least 1 scale up as the final result.

awprice commented 1 year ago

This solves #224

If you want this to close the issue, you'll need to use the keywords from here - https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword, e.g. closes/fixes/resolves etc.

awprice commented 1 year ago

Would be good to see in the description of this PR what the change does to fix the issue - I had to look through the code to see how it addresses the original issue

awprice commented 1 year ago

LGTM - thanks for the contribution!