Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.96k stars 306 forks source link

[Feature] Blue/Green Cluster K8s upgrades - Nodepool #3617

Open pavneeta opened 1 year ago

pavneeta commented 1 year ago

Is your feature request related to a problem? Please describe. Customers often orchestrate our own nodepool upgrades for K8s versions and node images to have zero application down time and test the new nodepool/k8s version before switching over traffic/requests . Use the new version deployment to do smoke tests or metric based roll forward with the availability to rollback in case of failures/issues.

Describe the solution you'd like AKS to orchestrate Nodepool and Control plane upgrades in a Blue-green fashion giving customers certain levers to rollback based on metrics, have bake time of the new version to do smoke testing and gracefully drain the older versions.

denniszielke commented 1 year ago

I would like to see more metrics/ alerts and observability on the upgrade process. Before customers can give up control on the upgrade process to AKS it is critical to understand what is happening during a normal upgrade and how the service behaves during an upgrade failure.

kaarthis commented 5 months ago

We decided to focus on B/G nodepool upgrades (Vanilla) first and then in subsequent phases bring metrics/ alerts to determine the B/G strategy.

kaarthis commented 3 months ago

This is the proposal we have - AKS customers can upgrade their nodepool kubernetes versions, node OS in a blue green fashion without having to manually orchestrate the new (Green) nodepool creation then workload migration and the draining/deleting of the old(Blue) nodepool. When using this strategy, AKS will automatically create a Green nodepool (under the hood) in the cluster (one for every existing nodepool at the time of upgrade), cordon and drain the Blue nodepool, surface upgrade events and notices. Customers can also control the initial step size of the green nodepool, and define a soak time during which Blue nodepool is persisted so that customers can run smoke tests on their applications (on their own not supported by AKS ) and rollback to preview version immediately if needed.
With this launch customers now have two upgrade strategy options that they can use – Rolling upgrades (current method) on existing nodepools and the new blue-green upgrade strategy.

Parameters we ll be using for the proposal and WHY

Operations allowed:

Phases of B/G