Updating worker nodes K8S version is sometimes not done node by node

guy-microsoft commented 3 years ago

What happened: Updated K8S version using ARM template of both control plane and worker nodes. Nodes were updated simultaneously, resulting in disruption to running applications:

What you expected to happen: Nodes should be updated one by one to avoid disruption to running applications: This is also mentioned in this doc

How to reproduce it (as minimally and precisely as possible): Not consistent. In most of the cases it is done as expected (one by one). However, sometimes it is done simultaneously

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): While upgrading from 1.18 to 1.19
Size of cluster (how many worker nodes are in the cluster?): ~10

ghost commented 3 years ago

Hi guy-microsoft, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such: 1) If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster. 2) Please abide by the AKS repo Guidelines and Code of Conduct. 3) If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics? 4) Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS. 5) Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue. 6) If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

justindavies commented 3 years ago

Hi @guy-microsoft would you be able to let us see the arm snippet you are using to do the upgrade?

guy-microsoft commented 3 years ago

Hi @justindavies, sure. This is the part of the template that is relevant to the cluster upgrade:

{
    "type": "Microsoft.ContainerService/managedClusters",
    "apiVersion": "2021-02-01",
    "name": "[parameters('aksName')]",
    "location": "[parameters('location')]",
    "dependsOn": [
    "[resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName'))]",
    "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsDefaultName'))]"
    ],
    "sku": {
    "name": "Basic",
        "tier": "Free"
    },
    "properties": {
        "kubernetesVersion": "1.19.11",
            "dnsPrefix": "[parameters('aksName')]",
            "agentPoolProfiles": [
                {
                    "name": "hbs",
                    "count": 8,
                    "vmSize": "Standard_DS2_v2",
                    "osDiskSizeGB": 128,
                    "osDiskType": "Managed",
                    "maxPods": 110,
                    "type": "VirtualMachineScaleSets",
                    "availabilityZones": [
                        "1",
                        "2",
                        "3"
                    ],
                    "minCount": 3,
                    "maxCount": 100,
                    "enableAutoScaling": true,
                    "orchestratorVersion": "1.19.11",
                    "enableNodePublicIP": false,
                    "osType": "Linux",
                    "mode": "System",
                    "nodeLabels": {
                        "scope": "hbs",
                        "subScope": "default"
                    }
                },
                {
                    "name": "nodepool4er",
                    "count": 8,
                    "vmSize": "Standard_DS2_v2",
                    "osDiskSizeGB": 128,
                    "osDiskType": "Managed",
                    "maxPods": 110,
                    "type": "VirtualMachineScaleSets",
                    "availabilityZones": [
                        "1",
                        "2",
                        "3"
                    ],
                    "minCount": 4,
                    "maxCount": 100,
                    "enableAutoScaling": true,
                    "orchestratorVersion": "1.19.11",
                    "enableNodePublicIP": false,
                    "osType": "Linux",
                    "mode": "User",
                    "nodeLabels": {
                        "scope": "external-resources"
                    }
                }
            ],
            "servicePrincipalProfile": {
            "clientId": "[parameters('servicePrincipalAppId')]",
                "secret": "[parameters('servicePrincipalSecret')]"
        },
        "addonProfiles": {
            "kubedashboard": {
                "enabled": true,
                    "config": {}
            },
            "omsagent": {
                "enabled": true,
                    "config": {
                    "loganalyticsworkspaceresourceid": "[resourceId('Microsoft.OperationalInsights/workspaces', variables('logAnalyticsDefaultName'))]"
                }
            }
        },
        "enableRBAC": true,
        "networkProfile": {
            "networkPlugin": "kubenet",
            "networkPolicy": "calico",
            "loadBalancerSku": "Standard"
        },
        "autoScalerProfile": {
            "balance-similar-node-groups": "false",
                "expander": "random",
                "max-empty-bulk-delete": "10",
                "max-graceful-termination-sec": "600",
                "max-total-unready-percentage": "45",
                "new-pod-scale-up-delay": "0s",
                "ok-total-unready-count": "3",
                "scale-down-delay-after-add": "10m",
                "scale-down-delay-after-delete": "10s",
                "scale-down-delay-after-failure": "3m",
                "scale-down-unneeded-time": "10m",
                "scale-down-unready-time": "20m",
                "scale-down-utilization-threshold": "0.5",
                "scan-interval": "10s",
                "skip-nodes-with-local-storage": "false",
                "skip-nodes-with-system-pods": "true"
        },
        "aadProfile": "[if(equals(parameters('env'), 'prod'), variables('aksAadProfile'), json('null'))]"
    }
}

justindavies commented 3 years ago

Would you be able to raise a support request, this should work as expected.

ghost commented 3 years ago

Hi there :wave: AKS bot here. This issue has been tagged as needing a support request so that the AKS support and engineering teams have a look into this particular cluster/issue.

Follow the steps here to create a support ticket for Azure Kubernetes Service and the cluster discussed in this issue.

Please do mention this issue in the case description so our teams can coordinate to help you.

Thank you!

guy-microsoft commented 3 years ago

Yes @justindavies, I will do it.

airmnichols commented 3 years ago

@guy-microsoft Something you should check is if you have Pod Disruption Budgets defined for your applications. I ran in to a similar issue when upgrading AKS to a newer version of kubernetes and it was found that our DevOps team didn't define Pod Disruption Budgets for apps.

With pod disruption budgets in place, the cluster should ensure that there is always the desired number of pods in a ready state before proceeding with planned maintenance operations.

https://docs.microsoft.com/en-us/azure/aks/operator-best-practices-scheduler#voluntary-disruptions

guy-microsoft commented 3 years ago

Thanks @mikenri, but we did defined a Pod Disruption Budget for every application with #replicas > 2 to ensure its minimal number of pods is 1 during the update. It's indeed interesting why there were disruptions in my service despite the Pod Disruption Budget when all nodes were updated simultaneously.

ghost commented 3 years ago

Case being worked with Microsoft Support, adding stale label for automatic closure if not other reports are added.

ghost commented 3 years ago

This issue will now be closed because it hasn't had any activity for 15 days after stale. guy-microsoft feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.

Azure / AKS

Updating worker nodes K8S version is sometimes not done node by node #2408