gardener / autoscaler

Customised fork of cluster-autoscaler to support machine-controller-manager
Apache License 2.0
16 stars 25 forks source link

Support parallel node-group scale-ups #268

Open himanshu-kun opened 11 months ago

himanshu-kun commented 11 months ago

What would you like to be added:

Upstream has enabled a feature where CA could scale-up multiple node-groups in a single RunOnce() loop. This could help reduce the latency of scale-ups

PR -> https://github.com/kubernetes/autoscaler/pull/5731 It is available from v1.28 k/CA release But as listed in the release notes we have to test it out with our MCM and also should wait a few upstream releases for the feature to stabilize

Demand was raised as per live ticket # 4048

Why is this needed: There are customers who need quicker scale-ups , as scale-up one node-group at a time delays their scale-downs, adding to cost and crossing maintenance time windows, if they are doing something like blue/green deployment

ashwani2k commented 9 months ago

Post Grooming of the Issue: Inputs from @rishabh-11 and @unmarshall Verbatim

I think in our case, parallel scaleups and sequential scaleups don't have much of a difference. The reason is if you look at the methods that execute these scaleups i.e. executeScaleUpsParallel (https://github.com/gardener/autoscaler/blob/0ebbdfb263f573400f470cc76ddcf38d89cc059e/cluster-autoscaler/core/scaleup/orchestrator/executor.go#L91) and executeScaleUpsSync (https://github.com/gardener/autoscaler/blob/0ebbdfb263f573400f470cc76ddcf38d89cc059e/cluster-autoscaler/core/scaleup/orchestrator/executor.go#L72), both call executeScaleUp (https://github.com/gardener/autoscaler/blob/0ebbdfb263f573400f470cc76ddcf38d89cc059e/cluster-autoscaler/core/scaleup/orchestrator/executor.go#L139). executeScaleUpsParallel calls it in a go routine for each scale up and executeScaleUpsSync calls it one by one in a for loop. Now, executeScaleUp just increases the replica field of the machine deployment corresponding to the node group which won't take time if everything is working fine. So we don't save any noticeable time.

rubenscheja commented 9 months ago

I thought the parellelism would at least improve for scale-up requests on different workergroups

When looking at the 1.28.0 CA changelog I noticed that the parallel drain feature is now the default. Would that be sth. the could help us? I could not find any change of precedence in scale-up vs scale-down in the architecture description, though.

Another idea that we briefly talked about in the gardener stakeholder sync is using multiple CA instances in parallel, each instance caring for a different set of workergroups (that we would need to assign to a CA in the shoot). image As the "blue" group would only face scale-downs during maintenance, and the "green" group would only face scale-ups, and as both would be handled by different CAs, there should not be any blocking of scale-downs due to scale-ups. Question is, if that is feasible to be implemented with a reasonable effort

rubenscheja commented 9 months ago

BTW: We just faced and immense scale-down delay in our latest maintenance (link in our internal ticket) The Cluster Autoscaler seems to have been overwhelmed by the number of pods for which he needed to calculate potential scale-ups. This increase the CA Cycle time from the usual 10s to over one minute as far as I can see from the linked controlplane logs in the ticket.

gardener-robot commented 9 months ago

@rubenscheja You have mentioned internal references in the public. Please check.