Open himanshu-kun opened 11 months ago
Post Grooming of the Issue: Inputs from @rishabh-11 and @unmarshall Verbatim
I think in our case, parallel scaleups and sequential scaleups don't have much of a difference. The reason is if you look at the methods that execute these scaleups i.e. executeScaleUpsParallel (https://github.com/gardener/autoscaler/blob/0ebbdfb263f573400f470cc76ddcf38d89cc059e/cluster-autoscaler/core/scaleup/orchestrator/executor.go#L91) and executeScaleUpsSync (https://github.com/gardener/autoscaler/blob/0ebbdfb263f573400f470cc76ddcf38d89cc059e/cluster-autoscaler/core/scaleup/orchestrator/executor.go#L72), both call executeScaleUp (https://github.com/gardener/autoscaler/blob/0ebbdfb263f573400f470cc76ddcf38d89cc059e/cluster-autoscaler/core/scaleup/orchestrator/executor.go#L139). executeScaleUpsParallel calls it in a go routine for each scale up and executeScaleUpsSync calls it one by one in a for loop. Now, executeScaleUp just increases the replica field of the machine deployment corresponding to the node group which won't take time if everything is working fine. So we don't save any noticeable time.
I thought the parellelism would at least improve for scale-up requests on different workergroups
When looking at the 1.28.0 CA changelog I noticed that the parallel drain feature is now the default. Would that be sth. the could help us? I could not find any change of precedence in scale-up vs scale-down in the architecture description, though.
Another idea that we briefly talked about in the gardener stakeholder sync is using multiple CA instances in parallel, each instance caring for a different set of workergroups (that we would need to assign to a CA in the shoot). As the "blue" group would only face scale-downs during maintenance, and the "green" group would only face scale-ups, and as both would be handled by different CAs, there should not be any blocking of scale-downs due to scale-ups. Question is, if that is feasible to be implemented with a reasonable effort
BTW: We just faced and immense scale-down delay in our latest maintenance (link in our internal ticket) The Cluster Autoscaler seems to have been overwhelmed by the number of pods for which he needed to calculate potential scale-ups. This increase the CA Cycle time from the usual 10s to over one minute as far as I can see from the linked controlplane logs in the ticket.
@rubenscheja You have mentioned internal references in the public. Please check.
What would you like to be added:
Upstream has enabled a feature where CA could scale-up multiple node-groups in a single
RunOnce()
loop. This could help reduce the latency of scale-upsPR -> https://github.com/kubernetes/autoscaler/pull/5731 It is available from v1.28 k/CA release But as listed in the release notes we have to test it out with our MCM and also should wait a few upstream releases for the feature to stabilize
Demand was raised as per live ticket # 4048
Why is this needed: There are customers who need quicker scale-ups , as scale-up one node-group at a time delays their scale-downs, adding to cost and crossing maintenance time windows, if they are doing something like blue/green deployment