Node reboots in single node system nodepool

Freakazoid182 commented 2 years ago

What happened:

With the following setup, node reboots get stuck when using a single node system nodepool:

Kubernetes version 1.22.4
Calico network policies
Auto-scaling turned off
1 System node

The reason the node reboots do not complete, is that CoreDNS has a PodDisruptionBudget set with minAvailable: 1. As there is only one node where this pod can be scheduled (1 system node), and that is the node that requires restarting.

My first thought was to set the cluster autoscaler to scale to max 2 nodes. Then technically a new node will start to re-schedule CoreDNS and allow the other node to restart. After a while the restarted node should be removed again by the autoscaler, being below the utilization threshold. This will not happen though because the calico-typha deployment requires to run at 2 replicas and can not run on the same node due to conflicting ports. I.e. setting max 2 nodes on the autoscaler for the system nodepool will cause the nodepool to always run with 2 nodes.

The CoreDNS PodDisruptionBudget and calico-typha replica number don't seem to be configurable. Updating the Kubernetes state only works temporarily as AKS will overwrite these configurations eventually.

Because of the stated reasons, it is currently impossible to run a single node system nodepool with this cluster setup.

In my case, a single node is preferred as it's for a development / test AKS-cluster which doesn't have any HA requirements. Always running an extra node just to support running a single calico-typha pod seems wasteful. The cluster can work just as well with a single calico-typha pod instance.

What you expected to happen:

By being able to configure the CoreDNS PodDisruptionBudget and / or the number of calico-typha replicas, make it possible to run a single system node.

How to reproduce it (as minimally and precisely as possible):

Create an AKS cluster with calico networking, and a single system nodepool with no autoscaling.
Note that a node reboot does not complete, and you end up with a cordoned node where most Deployments except CoreDNS are drained. The cluster is left in a broken state.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.22.4
Size of cluster (how many worker nodes are in the cluster?): 1 System Node
General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.): N.A.
Others: ...

ghost commented 2 years ago

Hi Freakazoid182, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such: 1) If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster. 2) Please abide by the AKS repo Guidelines and Code of Conduct. 3) If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics? 4) Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS. 5) Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue. 6) If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

ghost commented 2 years ago

Triage required from @Azure/aks-pm

ghost commented 2 years ago

Action required from @Azure/aks-pm

ghost commented 2 years ago