Closed himanshu-kun closed 9 months ago
The current distribution logic in general only statically distributes the min (defined at the worker pool) amongst the machine deployments. In the specific case where lots of small clusters (with min = 1) are created across 3 zones, the first zone picked up depends on the order in which the zones are specified when defining a shoot, thereby increasing the chances of quota exhaustion in the first zone. The customer in this case uses the same stencil to create all of their clusters and thus resulting in the same order in which zones are set.
Alternative 1 We also discussed if the customer can randomise the order in which zones are specified but we soon found that this was also not optimal for the following reasons:
The issue is further exacerbated with the fact that cluster is not marked as ready till all the min machines are up and ready. So if the customer waits for this condition to start deploying their workloads then that condition will not be met if one of the zones selected for bringing up VMs is already out of quota. Once PR is released then the customers will no longer have to wait for all machines to be up and ready as CA will see the unscheduled pods and will backoff from one zone and scale another zone instead. However this would additionally require a way for the customer to know if at least some nodes are available in the cluster.
Alternative 2 To ensure quota availability customers could create a capacity reservation at the zone or region level. Then even with the current distribution logic their VMs will come up. However capacity reservation does not come free and the customers did not find it as a viable and cost-effective option.
As mentioned above by @himanshu-kun allowing min = 0, allows CA to take up the responsibility to choose which zone a VM can be launched. In case you have any better suggestions/alternatives then we would like to discuss them.
/assign @elankath
This backlog will be transformed into an epic with 3 stages:
--nodes
flag). Shoot spec will also have some enhacements.The Gardener project currently lacks enough active contributors to adequately respond to all issues. This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
/lifecycle stale
The Gardener project currently lacks enough active contributors to adequately respond to all issues. This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
/lifecycle rotten
The Gardener project currently lacks enough active contributors to adequately respond to all issues. This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
/close
@gardener-ci-robot: Closing this issue.
Addendum: Stage-1 was simply satisfied with: https://github.com/gardener/gardener/pull/8490
How to categorize this issue?
/area auto-scaling /kind enhancement
What would you like to be added:
Allow customer to set
min:0
for all worker groups. Currently atleast 1 worker group should have min > 0 for the system components to run.Why is this needed:
When a customer creates hundreds of shoots , such that each shoot is with a single worker pool (lets say
worker-1
) which spans over 3 zoneszone-1
,zone-2
,zone 3
.Since our current distribution logic of
min
over zones tries to keep maxSkew <=1 between the zones, so alwayszone-1
will be selected to place the node , ifmin
specified by cluster is 1 .After a point
zone-1
runs out of instances of that particular type , and the forth-created clusters don't get Ready , ever. This is problematic to the customer.This problem can be solved by delegating the responsibility of node distribution(post the min, max distribution for nodegroups by gardenlet) to cluster-autoscaler , as it is smart enough to backoff on out-of-quota node-groups and try new ones. Earlier this was not possible , as CA was deployed much later in the flow, but after this PR , CA is deployed way sooner, soon enough that it can even handle node scale-up for system-pods.
So after a combined discussion with @vlerenc , @unmarshall , @kon-angelo and the affected customers , the following combination should solve the above stated problem:
min:0
For more context refer:
canary issue 3546
Dependency: