Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 306 forks source link

Will the Stop/Start feature be compatible with the cluster autoscaler? #2120

Closed LoicGombeaud closed 3 years ago

LoicGombeaud commented 3 years ago

Hello, this is a question regarding the latest release, and more specifically the Stop/Start feature, now GA.

The documentation for the preview mentions that the cluster autoscaler must be stopped; is this still the case, now that the feature is GA?

ghost commented 3 years ago

Hi LoicAG, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such: 1) If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster. 2) Please abide by the AKS repo Guidelines and Code of Conduct. 3) If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics? 4) Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS. 5) Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue. 6) If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

ghost commented 3 years ago

Triage required from @Azure/aks-pm

ghost commented 3 years ago

Action required from @Azure/aks-pm

Kraelog commented 3 years ago

Hello!

I would very much like to inquire if/when the issues with CA will be resolved.

Currently in my environment stopping an AKS cluster works without a problem.

However, when I start the AKS cluster through an ARM pipeline, the system starts more nodes then specified by the minimum configuration, at which point the cluster autoscaling blocks the extra node and the provisioning state of the cluster goes to "failed".

Is this something being worked on?

Thanks!

justin-chizer commented 3 years ago

I am having a similar issue. az aks stop -g <rg> -n <name> works fine. But when I run az aks start -g <rg> -n <name> the cluster no longer respects the min count for the CA.

Using AZ CLI version: 2.20.0 AKS version: 1.19.6 Azure Region: westus2

 {
      "availabilityZones": null,
      "count": 0,
      "enableAutoScaling": true,
      "enableNodePublicIp": false,
      "maxCount": 4,
      "maxPods": 110,
      "minCount": 2,
      "mode": "System",
      "name": "system",
      "nodeImageVersion": "AKSUbuntu-1804gen2containerd-2021.02.10",
      "nodeLabels": {},
      "nodeTaints": null,
      "orchestratorVersion": "1.19.6",
      "osDiskSizeGb": 40,
      "osDiskType": "Ephemeral",
      "osType": "Linux",
      "powerState": {
        "code": "Running"
      },
      "provisioningState": "Succeeded",
      "proximityPlacementGroupId": null,
      "scaleSetEvictionPolicy": null,
      "scaleSetPriority": null,
      "spotMaxPrice": null,
      "tags": null,
      "type": "VirtualMachineScaleSets",
      "upgradeSettings": {
        "maxSurge": null
      },
      "vmSize": "Standard_E2as_v4",
      "vnetSubnetId": "/subscriptions/***/resourceGroups/rg-dev-westus2/providers/Microsoft.Network/virtualNetworks/vnet-dev-westus2/subnets/sub-dev-dev3"
    }

Did I miss a step?

cc: @palma21 @Aaron-ML

ghost commented 3 years ago

Issue needing attention of @Azure/aks-leads

palma21 commented 3 years ago

Thanks. Should be correct now, it does support CA now. Let us know if you still find any inconsistency.

palma21 commented 3 years ago

@justin-chizer your issue seems a different question then OP, can you clarify what you mean by respecting min count?

Per CA upstream behavior min count is only actionable when CA evaluates if it should scale down. If you start your pool it would come with 0 and CA would scale it up to the needs at the moment respecting the max.

ghost commented 3 years ago

Thanks for reaching out. I'm closing this issue as it was marked with "Answer Provided" and it hasn't had activity for 2 days.