Open timuthy opened 3 years ago
Other thoughts?
@timuthy maxSurge = ceil(maxSurge / numZones)
and maxUnavailable = ceil(maxUnavailable / numZones)
, i.e. a maxSurge
of 1
with 3
zones, turns into 1/3
, turns into 1
, but it also generally works for larger numbers, though that may not be as important, if at all (but it is more reasonable and less magical).
The main point is that we neglected this problem from the start. It's not only about maxSurge
/maxUnavailable
. It's also about minimum
/maximum
. Also those should better use ceil()
or, even better yet, the configuration should be "the same per zone". Just like with GKE, for instance. There, if you add a new pool, you specify how man nodes per zone are added (one number times the number of zones results in the overall number of required nodes, which is therefore always an integer multiple of the number per zone - while we on the other hand interpret the number for all zones and then divide by the number of zones and by that have to deal with the division remainder/leave the realm of integer numbers ;-)).
So the next thought is to give up on "hey, it's logical, everything in the configuration per worker pool is meant for the entirety of the machines" (the instance type, the OS, the volume, the container runtime, the taints/labels/annotations, the kubelet
configuration). And to be honest, it isn't that "logical" either (which all adds to the end user confusion). For instance dataVolumes
is not divided by the number of nodes/zones, but per node. So (unless we want to allow people to tweak the configuration even per zone; probably no use case for that unless in very narrow corner cases where some instance types are "out of stock" in particular zones), we could also have new dedicated properties, e.g. minMachinesPerZone
, maxMachinesPerZone
, maxSurgeMachinesPerZone
, maxUnavailableMachinesPerZone
which would finally end the ambiguity.
@dguendisch, @mvladev, @ialidzhikov, @plkokanov, @timuthy, @vpnachev, @schrodit, @danielfoehrkn, @beckermax, @rfranzke, @timebertt, @hendrikkahl, @kris94, @voelzmo, @stoyanr This issue was referenced by @vlerenc in duplicate issue gardener/autoscaler#104.
The Gardener project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closed
You can:/remove-lifecycle stale
/lifecycle rotten
/close
/lifecycle stale
The Gardener project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
/lifecycle rotten
The Gardener project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
/close
@gardener-ci-robot: Closing this issue.
/remove-lifecycle rotten
/reopen
@unmarshall: Reopened this issue.
The Gardener project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
/lifecycle stale
/remove-lifecycle stale
/assign
The Gardener project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
/lifecycle stale
GKE upgrade process -> https://cloud.google.com/kubernetes-engine/docs/concepts/node-pool-upgrade-strategies
They support blue-green upgrades, but we don't.
They upgrade one zone at a time, with their parameters working on only zone, while our upgrade works in parallel over all zones, but our parameters are split over zones , and current issue was opened for one such case in splitting.
Both approaches have their pros and cons.
Also they default the configuration to maxSurge=1, maxUnavailable=0
for their worker pool , if not provided. One solution recommended (see issue description) suggested similar.
Pros of one zone at a time:
progressDeadlineSeconds
The main point is that we neglected this problem from the start. It's not only about maxSurge/maxUnavailable. It's also about minimum/maximum. Also those should better use ceil() or, even better yet, the configuration should be "the same per zone".
How maximum is currently calculated can cause unexpected node termination through adding zones. For example if you have 12 workers in zone A
and a maximum of 30
then adding two more zones means that the new maximum of zone A
will be 10 workers. This will lead to node termination in zone A
.
This is something an operator/shoot owner can easily be surprised by and can impact workload. Therefore, I vote to change that. Ideas would be to specify the maximum per zone or let the maximum be the upper limit for the sum of the nodes in all zones.
We see some potential to improve the API wrt to the worker configuration. I opened https://github.com/gardener/gardener/issues/8142 for further discussion.
:warning: I feel the https://github.com/gardener/machine-controller-manager/issues/798#issuecomment-1599149933 from @BeckerMax needs a more urgent attention and should be tracked in a separate issue to expedite its handling vs the perfect design to achieve the ask of allowing per zone configuration.
I did a simple experiment to check and found that this will lead to rolling of machines/pods in the current setup and may go unnoticed if the current max can accommodate the additional machines required/zone.
I would propose:
We can enhance the current dashboard warning by either proposing the min values or direct him of the consequences.
wdyt?
FYI: Proposal https://gist.github.tools.sap/D043832/f5d0ac0e0bb138eea07c84ac79e10ce9#step-1-enhancing-zone-distribution-and-surge-management (only step 1 relevant here)
How to categorize this issue?
/area usability /kind enhancement /priority 3
What would you like to be added: Today,
maxSurge
andmaxUnavailable
values are configured at the worker pool level (ref). Provider extensions usually distribute the configured values if multiple multiple zones are configured (ref).Although distributing these numbers is generally acceptable, it seems unclear to end-users and thus can end in an unacceptable and unexpected cluster upgrade behavior. This is especially true when
maxSurge < len(zones)
andmaxSurge < len(zones) && maxUnavailable < maxSurge
Example:
This will result in 3
MachineDeployments
:While the workers in
europe-west1-a
are upgraded in a rolling fashion, the ones ineurope-west1-b
andeurope-west1-c
are just replaced. During the upgrade procedure, the cluster will have lessNode
s then configured inworkers[*].minimum
.We see the following options to improve this user experience (only when
maxSurge < len(zones)
):maxSurge >= len(zones)
--> incompatible and will probably many automation functionalities around Gardener.maxSurge: 1
for each zone (suggested by @AxiomSamarth @himanshu-kun) --> solves many "standard" cases in whichmaxUnavailable
is not used.Why is this needed: Needed for better user experience to avoid unexpected outages.