atlassian / escalator

Escalator is a batch or job optimized horizontal autoscaler for Kubernetes
Apache License 2.0
646 stars 58 forks source link

Escalator not untainting nodes when tainted nodes is larger than the min nodes #168

Closed awprice closed 4 years ago

awprice commented 4 years ago

We've noticed that if the number of tainted nodes is larger than the minimum nodes for the node group, escalator will prefer to bring up new nodes in the cloud provider, rather than untainting the already tainted nodes.

Logs:

time="2019-09-26T03:16:13Z" level=debug msg="**********[START NODEGROUP foo]**********"
time="2019-09-26T03:16:13Z" level=debug msg="auto discovered min_nodes = 1 for node group foo"
time="2019-09-26T03:16:13Z" level=debug msg="auto discovered max_nodes = 10 for node group foo"
time="2019-09-26T03:16:13Z" level=info msg="pods total: 11" nodegroup=foo
time="2019-09-26T03:16:13Z" level=info msg="nodes remaining total: 2" nodegroup=foo
time="2019-09-26T03:16:13Z" level=info msg="cordoned nodes remaining total: 0" nodegroup=foo
time="2019-09-26T03:16:13Z" level=info msg="nodes remaining untainted: 0" nodegroup=foo
time="2019-09-26T03:16:13Z" level=info msg="nodes remaining tainted: 2" nodegroup=foo
time="2019-09-26T03:16:13Z" level=info msg="Minimum Node: 1" nodegroup=foo
time="2019-09-26T03:16:13Z" level=info msg="Maximum Node: 10" nodegroup=foo
time="2019-09-26T03:16:13Z" level=warning msg="There are less untainted nodes than the minimum" nodegroup=foo
time="2019-09-26T03:16:13Z" level=warning msg="There are no tainted nodes to untaint" nodegroup=foo
time="2019-09-26T03:16:13Z" level=info msg="increasing cloud provider node group by 1" drymode=false nodegroup=foo
time="2019-09-26T03:16:13Z" level=debug msg="IncreaseSize: 1" asg=foo
time="2019-09-26T03:16:13Z" level=debug msg="SetDesiredCapacity: 3" asg=foo
time="2019-09-26T03:16:13Z" level=debug msg="CurrentSize: 2" asg=foo
time="2019-09-26T03:16:13Z" level=debug msg="CurrentTargetSize: 2" asg=foo
time="2019-09-26T03:16:13Z" level=debug msg="Locking scale lock"
time="2019-09-26T03:16:13Z" level=debug msg="Scaling took a total of 494.389056ms"

cc @Jacobious52