KUBE-5984 - ensure that scale-ups always occur when there are starved pods

triorph commented 1 year ago

This fixes https://github.com/atlassian/escalator/issues/224

The main change here is to add a ScaleOnStarve option to the node group configuration. This true/false value configures an additional check on the nodeDelta calculated during the scaling step.

When we gather the RequestedPod, we also gather the largest pending pods (by both CPU and Memory). When we gather the node capacity, we also gather the largest node (node with allocatable CPU/Memory minus used pod CPU/memory) that is the highest. If either of the requested pods have larger requirements than what's available on the largest capacity, then that indicates we have a "starved pod". In the case that a pod exists with no nodes available (and we have ScaleOnStarve enabled), then we make sure that the scaling algorithm has at least 1 scale up as the final result.

awprice commented 1 year ago

This solves #224

If you want this to close the issue, you'll need to use the keywords from here - https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword, e.g. closes/fixes/resolves etc.

awprice commented 1 year ago

Would be good to see in the description of this PR what the change does to fix the issue - I had to look through the code to see how it addresses the original issue

awprice commented 1 year ago

LGTM - thanks for the contribution!

atlassian / escalator

KUBE-5984 - ensure that scale-ups always occur when there are starved pods #225