Describe the bug
If a large pod is requested, but it does not push the total requested resources past the scale up threshold, while no individual node has required resources to schedule it, it may never be scheduled or take a long time to get scheduled.
To Reproduce
Consider the following scenario, where each node has 1000m CPU allocatable and there are no pods pending scheduling. The state of the cluster looks like:
Node Name
Sum of pod requested CPU
Node 1
850m
Node 2
850m
Node 3
850m
Node 4
850m
Node 5
850m
In this state, the cluster total requested usage is 4250m (85%). Consider if we had a scale-up threshold of 90% (4500m). If we then attempted to schedule a pod with a CPU request of 200m, it would push the total requested usage to 4450m (89%), which is not enough to trigger a scale-up. Since no node has more than 150m CPU available, this pod would not be able to be scheduled on to any node, and may be pending indefinitely unless another pod is submitted to the cluster which triggers the scale up.
Expected behavior
Escalator recognises that despite not being over the scale-up threshold, there is an unscheduled pod that is not able to be scheduled onto any of the currently available nodes, and triggers a scale-up.
Screenshots or Logs
Kubernetes Cluster Version
v1.24.12
Escalator Version
v1.13.1 (or whatever version we are using internally)
Describe the bug If a large pod is requested, but it does not push the total requested resources past the scale up threshold, while no individual node has required resources to schedule it, it may never be scheduled or take a long time to get scheduled.
To Reproduce Consider the following scenario, where each node has
1000m
CPU allocatable and there are no pods pending scheduling. The state of the cluster looks like:In this state, the cluster total requested usage is 4250m (85%). Consider if we had a scale-up threshold of 90% (4500m). If we then attempted to schedule a pod with a CPU request of
200m
, it would push the total requested usage to 4450m (89%), which is not enough to trigger a scale-up. Since no node has more than150m
CPU available, this pod would not be able to be scheduled on to any node, and may be pending indefinitely unless another pod is submitted to the cluster which triggers the scale up.Expected behavior Escalator recognises that despite not being over the scale-up threshold, there is an unscheduled pod that is not able to be scheduled onto any of the currently available nodes, and triggers a scale-up.
Screenshots or Logs
Kubernetes Cluster Version v1.24.12
Escalator Version v1.13.1 (or whatever version we are using internally)
Additional context