gardener / autoscaler

Customised fork of cluster-autoscaler to support machine-controller-manager
Apache License 2.0
16 stars 25 forks source link

Fix targeted scale-in logic #215

Closed rishabh-11 closed 1 year ago

rishabh-11 commented 1 year ago

What this PR does / why we need it: This PR updates the priority of machines with nodes without ToBeDeletedTaint to 3 before calling DeleteMachines to scale down the corresponding machine deployment.

Which issue(s) this PR fixes: Fixes #159

Special notes for your reviewer: This PR does not deal with one case of #159. It is as follows:-

  1. All Nodes in a node group: [N1, N2, N3, N4] At T1, CA marked [N2, N4] with ToBeDeleted taint, added priority=1, and reduced the machine deployment replicas by 2. Then CA restarts, which calls cleanUpIfRequired, which removes all the taints and resets the priority of N2 and N4 to 3. By this point, if MCM has not started the deletion process, then it is possible that N2 and N4 are not selected for deletion because MCM selects machines to delete based on priority annotation value.

Following test cases are added for the DeleteNodes method:-

  1. Delete single node happy path.
  2. Delete single placeholder node happy path.
  3. Delete a node with an error in updating machine priority to 1 due to timeout.
  4. Delete a node with updateMcD error timeout.
  5. Delete a node with updateMcD error resolving b4 timeout.
  6. McD is under rolling update.
  7. Machine is already in a terminating state. No scale-down should happen.
  8. Should not scale down the McD beyond the minimum.
  9. Should not delete node if it does not belong to the machine deployment.

Following test cases are added for the Refresh method:-

  1. Reset the priority of a machine having a node without ToBeDeletedTaint to 3.
  2. Don't reset the priority of a machine having a node with ToBeDeletedTaint.
  3. Failure to reset the priority of a machine

The base cluster requirements for IT are updated. The faking (for unit tests of mcm cloud provider implementation) is done for the client only. Listers are not faked

Release note:

A bug where MCM removed a machine other than the one , CA wanted , is resolved.
`machinepriority.machine.sapcloud.io` annotation on machine is now reset to 3 by autoscaler if the corresponding node doesn't have `ToBeDeletedByClusterAutoscaler` taint
Initial implementation for `Refresh()` method of `CloudProvider` interface done
unit tests framework introduced to test implemented methods of `Cloudprovider` and `Nodegroup` interface
gardener-robot commented 1 year ago

@unmarshall, @himanshu-kun You have pull request review open invite, please check