Activate Early backoff functionality

himanshu-kun commented 1 year ago

What this PR does / why we need it: Activates Early backoff for mcm cloud provider

Which issue(s) this PR fixes: Fixes #154

Special notes for your reviewer:

CORNER_CASE:

There is still a corner case which can block scale-up for a while. If node-grp is scaled-up 
such that, the same node-grp couldn't be scaled-down (ex- blocked scale-down due to rolling update) then , the set of pods which triggered the scale-up for the node-grp will not be considered `unschedulable` by autoscaler for `max-node-provision-time` (considering the VM doesn't join in that time)
this is because it will still considers the node in the node-grp (which we know won't join due to `ResourceExhausted`) as `Upcoming`. This can be justified also , because `ResourceExhausted` is a recoverable error so the node can still join.

Docs: For now they are added in FAQ.md. Will move them to another folder, when refactoring CA docs overall.

Test results: 1) Early backoff from a single nodegrp working as expected

Manual Test Case 1

Nodegrp out-of-quota is scaled-up first ``` I0927 12:32:38.650323 94730 klogx.go:87] Pod default/scale-up-pod-75b94d88b5-rdrfb is unschedulable I0927 12:32:38.650333 94730 orchestrator.go:109] Upcoming 0 nodes I0927 12:32:38.650759 94730 waste.go:55] Expanding Node Group shoot--i544024--early-bckf-worker-no-avail-z1 would waste 55.00% CPU, 99.39% Memory, 77.19% Blended I0927 12:32:38.650773 94730 waste.go:55] Expanding Node Group shoot--i544024--early-bckf-worker-avail-z1 would waste 77.50% CPU, 99.69% Memory, 88.59% Blended I0927 12:32:38.650783 94730 orchestrator.go:194] Best option to resize: shoot--i544024--early-bckf-worker-no-avail-z1 I0927 12:32:38.650791 94730 orchestrator.go:198] Estimated 1 nodes needed in shoot--i544024--early-bckf-worker-no-avail-z1 I0927 12:32:38.650820 94730 orchestrator.go:311] Final scale-up plan: [{shoot--i544024--early-bckf-worker-no-avail-z1 0->1 (max: 10)}] I0927 12:32:38.650832 94730 orchestrator.go:583] Scale-up: setting group shoot--i544024--early-bckf-worker-no-avail-z1 size to 1 ``` CA senses that node won’t come up due to Resource Exhausted , so marks nodegrp as backoff + removes the scaled up machine ``` I0927 12:32:59.639238 94730 clusterstate.go:1059] Found 1 instances with errorCode OutOfResource.ResourceExhausted in nodeGroup shoot--i544024--early-bckf-worker-no-avail-z1 I0927 12:32:59.639254 94730 clusterstate.go:1077] Failed adding 1 nodes (1 unseen previously) to group shoot--i544024--early-bckf-worker-no-avail-z1 due to OutOfResource.ResourceExhausted; errorMessages=[]string{"Create machine \"shoot--i544024--early-bckf-worker-no-avail-z1-6485c-vg2qn\" failed: googleapi: Error 400: Invalid value for field 'resource.machineType': 'zones/asia-northeast1-b/machineTypes/g2-standard-4'. Machine type with name 'g2-standard-4' does not exist in zone 'asia-northeast1-b'., invalid"} W0927 12:32:59.639304 94730 clusterstate.go:287] Disabling scale-up for node group shoot--i544024--early-bckf-worker-no-avail-z1 until 2023-09-27 12:37:59.638262 +0530 IST m=+932.738328537; errorClass=OutOfResource; errorCode=ResourceExhausted I0927 12:32:59.639328 94730 static_autoscaler.go:405] 1 unregistered nodes present I0927 12:32:59.639346 94730 static_autoscaler.go:806] Deleting 1 from shoot--i544024--early-bckf-worker-no-avail-z1 node group because of create errors I0927 12:32:59.639344 94730 event_sink_logging_wrapper.go:48] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"kube-system", Name:"cluster-autoscaler-status", UID:"ed027710-58d0-46b9-9e59-66ccffa771fc", APIVersion:"v1", ResourceVersion:"79781", FieldPath:""}): type: 'Warning' reason: 'ScaleUpFailed' Failed adding 1 nodes to group shoot--i544024--early-bckf-worker-no-avail-z1 due to OutOfResource.ResourceExhausted; source errors: Create machine "shoot--i544024--early-bckf-worker-no-avail-z1-6485c-vg2qn" failed: googleapi: Error 400: Invalid value for field 'resource.machineType': 'zones/asia-northeast1-b/machineTypes/g2-standard-4'. Machine type with name 'g2-standard-4' does not exist in zone 'asia-northeast1-b'., invalid I0927 12:32:59.856571 94730 mcm_manager.go:525] Machine shoot--i544024--early-bckf-worker-no-avail-z1-6485c-vg2qn of machineDeployment shoot--i544024--early-bckf-worker-no-avail-z1 marked with priority 1 successfully I0927 12:32:59.856593 94730 mcm_manager.go:527] Expected to remove following {machineRef: corresponding node} pairs map[shoot--i544024--early-bckf-worker-no-avail-z1-6485c-vg2qn:] ``` CA tries another zone ``` W0927 12:33:10.465864 94730 orchestrator.go:510] Node group shoot--i544024--early-bckf-worker-no-avail-z1 is not ready for scaleup - backoff I0927 12:33:10.466135 94730 waste.go:55] Expanding Node Group shoot--i544024--early-bckf-worker-avail-z1 would waste 77.50% CPU, 99.69% Memory, 88.59% Blended I0927 12:33:10.466151 94730 orchestrator.go:194] Best option to resize: shoot--i544024--early-bckf-worker-avail-z1 I0927 12:33:10.466159 94730 orchestrator.go:198] Estimated 1 nodes needed in shoot--i544024--early-bckf-worker-avail-z1 I0927 12:33:10.466185 94730 orchestrator.go:311] Final scale-up plan: [{shoot--i544024--early-bckf-worker-avail-z1 3->4 (max: 5)}] I0927 12:33:10.466193 94730 orchestrator.go:583] Scale-up: setting group shoot--i544024--early-bckf-worker-avail-z1 size to 4 ``` First scaled up= `12:32:38` Next scale up after learning= `12:33:10` (in just 30sec !)

2) Early backoff from multiple nodegrps in a row working as expected

Manual test case 2

Trying scale-up in `no-avail` ``` I0927 13:38:12.451470 3487 orchestrator.go:311] Final scale-up plan: [{shoot--i544024--early-bckf-worker-no-avail-z1 0->1 (max: 10)}] ``` Backoff on failure ``` W0927 13:38:33.478725 3487 clusterstate.go:287] Disabling scale-up for node group shoot--i544024--early-bckf-worker-no-avail-z1 until 2023-09-27 13:43:33.477067 +0530 IST m=+336.462617495; errorClass=OutOfResource; errorCode=ResourceExhausted ``` Trying scale-up in `no-avail2` ``` I0927 13:38:44.265628 3487 orchestrator.go:311] Final scale-up plan: [{shoot--i544024--early-bckf-worker-noavail2-z1 0->1 (max: 10)}] ``` Backoff on failure ``` W0927 13:38:54.852199 3487 clusterstate.go:287] Disabling scale-up for node group shoot--i544024--early-bckf-worker-noavail2-z1 until 2023-09-27 13:43:54.84848 +0530 IST m=+357.833835085; errorClass=OutOfResource; errorCode=ResourceExhausted ``` Finally scaling up `avail-z1` ``` I0927 13:39:05.700509 3487 orchestrator.go:311] Final scale-up plan: [{shoot--i544024--early-bckf-worker-avail-z1 4->5 (max: 5)}] ```

3) Early backoff doesn't happen for Invalid credentials error as it is an Internal error

Release note:

Gardener autoscaler now backs-off early from a node-group (i.e. machinedeployment) in case of `ResourceExhausted` error. Refer docs at `https://github.com/gardener/autoscaler/blob/machine-controller-manager-provider/cluster-autoscaler/FAQ.md#when-does-autoscaler-backs-off-early-from-a-node-group` for details.

himanshu-kun commented 1 year ago

/assign @rishabh-11

rishabh-11 commented 1 year ago

Should we mention the corner case in the docs? It would be easier for people to notice it there instead of the PR.

rishabh-11 commented 1 year ago

/lgtm

gardener / autoscaler

Activate Early backoff functionality #253