Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.95k stars 305 forks source link

[Feature] AKS Automatic #4301

Open sabbour opened 4 months ago

sabbour commented 4 months ago

AKS Automatic mode is a new capability within Azure Kubernetes Service (AKS) that lets you create optimized, ready-for-production clusters with reduced operational overhead. This means you can focus on building and running your applications, while AKS takes care of the rest.

AKS Automatic mode is currently in preview. You can get started today by following the quickstart guide on Microsoft Learn to create an AKS Automatic cluster and deploy an application.

Learn more about AKS Automatic: https://aka.ms/aks/automatic

We would love to hear your feedback and suggestions on how to make AKS Automatic mode even better and continue to boost your focus on your workloads.

Below are some of the current limitations, which will be broken into their own roadmap items.

### Known preview limitations and roadmap
- [ ] You may need to explicitly select a different VM SKU for the system node pool if you don't have quota for the default one in the region you're deploying to
- [ ] Monitoring is wired up only when using the CLI and portal
- [ ] Wiring up of auto instrumentation
- [ ] Bring your own virtual network support
PixelRobots commented 4 months ago

Hey @sabbour

It seems that there is an issue when using the CLI. Automatic mode should be using the standard tier and not free, but if you try and deploy in West Europe you get an error message saying free tier is unavailable. I would say this is a bug as we are not using the free tier.

Please see the error message in the image below. image

sabbour commented 4 months ago

@pixelrobots can you confirm which aks-extension version you're using in the CLI?

PixelRobots commented 4 months ago

I am using 4.0.0b4.

I have also noticed it is not available via the portal yet even the preview one. I assume that will come at one point this week.

PixelRobots commented 4 months ago

it also looks like the Prometheus recording rules are not enabled by default too.

image

And it also seems to have an issue with the monitor settings.

image

sabbour commented 4 months ago

I am using 4.0.0b4.

I have also noticed it is not available via the portal yet even the preview one. I assume that will come at one point this week.

Yes, the new portal experience (create/manage) is lighting up shortly. It is behind a flag now at https://aka.ms/aks/flight/automatic.

@PixelRobots we're looking into the other issue now.

JoeyC-Dev commented 4 months ago

It will be better to mention the following command in the document to get the VM sizes available in all availability zones. I only tried this after 10 mins failure of creating AKS:

az vm list-skus --location southeastasia --output table

You can feel my pain:

az aks create -n testAKS -g eastusrG --sku automatic --node-vm-size Standard_D8ds_v5 --location eastus2 --no-ssh-key
Argument '--sku' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
The behavior of this command has been altered by the following extension: aks-preview
(BadRequest) The VM size of Standard_D8ds_v5 is only allowed  in zones [2 1] in your subscription in location 'eastus2'. 
Code: BadRequest
Message: The VM size of Standard_D8ds_v5 is only allowed  in zones [2 1] in your subscription in location 'eastus2'. 
az aks create -n testAKS -g eastusrG --sku automatic --no-ssh-key
Argument '--sku' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
The behavior of this command has been altered by the following extension: aks-preview
(BadRequest) The VM size of Standard_DS4_v2 is only allowed  in zones [3 1] in your subscription in location 'eastus'. 
Code: BadRequest
Message: The VM size of Standard_DS4_v2 is only allowed  in zones [3 1] in your subscription in location 'eastus'. 
az aks create -n testAKS -g eastusrG --sku automatic --node-vm-size Standard_B16als_v2 --no-ssh-key
Argument '--sku' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
The behavior of this command has been altered by the following extension: aks-preview
(VMSizeDoesNotSupportEphemeralOS) The Virtual Machine size Standard_B16als_v2 does not support Ephemeral OS disk.
Code: VMSizeDoesNotSupportEphemeralOS
Message: The Virtual Machine size Standard_B16als_v2 does not support Ephemeral OS disk.

And another suggestion: also can specify a default value for --node-osdisk-size while --sku automatic enabled on az-cli level, or otherwise:

joey [ ~ ]$ az aks create -n testAKS -g eastusrG --sku automatic --node-vm-size Standard_DC2s_v2 --location eastus --no-ssh-key 
Argument '--sku' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
The behavior of this command has been altered by the following extension: aks-preview
(VMCannotFitEphemeralOSDisk) The virtual machine size Standard_DC2s_v2 has a cache size of 46170898432 bytes and temporary disk size of 107374182400 bytes, but the OS disk requires 137438953472 bytes. Use a VM size with larger cache, larger temp disk, or disable ephemeral OS.
Code: VMCannotFitEphemeralOSDisk
Message: The virtual machine size Standard_DC2s_v2 has a cache size of 46170898432 bytes and temporary disk size of 107374182400 bytes, but the OS disk requires 137438953472 bytes. Use a VM size with larger cache, larger temp disk, or disable ephemeral OS.

It does can be fixed by specifying --node-osdisk-size:

joey [ ~ ]$ az aks create -n testAKS -g eastusrG --sku automatic --node-vm-size Standard_DC2s_v2 --node-osdisk-type Ephemeral --node-osdisk-size 48 --location eastus --no-ssh-key 
Argument '--sku' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
The behavior of this command has been altered by the following extension: aks-preview
 / Running ..
sabbour commented 4 months ago

@JoeyC-Dev thanks for the feedback, we'll look into this and get the docs updated. This is mostly a limitation during preview.

thepaulmacca commented 4 months ago

Hey @sabbour

It seems that there is an issue when using the CLI. Automatic mode should be using the standard tier and not free, but if you try and deploy in West Europe you get an error message saying free tier is unavailable. I would say this is a bug as we are not using the free tier.

Please see the error message in the image below.

image

@PixelRobots my guess is that it's setting the tier to free by default like the API does if not specified - but I've had those capacity errors in West Europe for days now, even on the standard tier

I've had to create my clusters in UK South for the moment

PixelRobots commented 4 months ago

Hey @sabbour

It seems that there is an issue when using the CLI. Automatic mode should be using the standard tier and not free, but if you try and deploy in West Europe you get an error message saying free tier is unavailable. I would say this is a bug as we are not using the free tier.

Please see the error message in the image below.

image

@PixelRobots my guess is that it's setting the tier to free by default - but I've had those capacity errors in West Europe for days now, even on the standard tier

I've had to create my clusters in uk south for the moment

So automatic should default to standard. Which it does. But the cli should not return that error in my opinion.

sabbour commented 4 months ago

It's a bug, we're getting it fixed.

JoeyC-Dev commented 4 months ago

Today, when I try to create automatic AKS via Azure portal, I find out this:

image

This subscription does not have the flags: AKS-PrometheusAddonPreview registered. Preview features must be registered in order to create a cluster. To continue, register the required flags to the subscription.

But it is not mentioned in the section Register the feature flags, and I can create the automatic AKS via Azure CLI: image

Not sure if the feature really needs to be registered, or it is a bug in Azure portal.

ramanjk commented 4 months ago

Unable to stop the AKS automatic cluster after my use.

image
PixelRobots commented 4 months ago

Unable to stop the AKS automatic cluster after my use. image

This is due to the node auto provisioning part of automatic. https://learn.microsoft.com/en-us/azure/aks/node-autoprovision?tabs=azure-cli#unsupported-features

@sabbour will this become a feature of nap and in turn automatic clusters.

ramanjk commented 4 months ago

agree, but it should have the feature, cx can stop and start the cluster.

sabbour commented 4 months ago

Automatic clusters don't support start/stop.

thepaulmacca commented 3 months ago

Do you know when we can expect this to be resolved when using Bicep/Terraform?

Monitoring is wired up only when using the CLI and portal

And the earlier issues that were mentioned?

For something that's in public preview, this is proving quite hard to do PoCs on at this point

PixelRobots commented 3 months ago

Just did a portal adventure to test some bits.

When creating going via the portal creating the log analytics workspace it puts it in a default resource group rather than the one with all other resources. Can we fix that please, so it is deployed in the same resource group as the cluster and other items.

I also noticed when I delete the cluster some resources are not cleaned up. This is not very automatic... can we have it, so they are cleaned up as long as the cluster is in automatic mode?

image

Jostepop commented 2 months ago

I have some questions/suggestions regarding the bring your own virtual network support:

  1. Will this feature allow us to control egress traffic? i.e. set egress type to user defined routing?
  2. Can we deploy AKS auto with only a load balancer (not ingress controller) and use a 3rd party reverse proxy like BigIP as ingress controller?
  3. Does "Bring your own virtual network" means that we can provide a pre-deployed VNET with VNET peerings and route tables for the AKS auto deployment?

Thank you!

sabbour commented 3 weeks ago

@Jostepop

Will this feature allow us to control egress traffic? i.e. set egress type to user defined routing?

Yes.

Can we deploy AKS auto with only a load balancer (not ingress controller) and use a 3rd party reverse proxy like BigIP as ingress controller?

We will look into allowing disabling the ingress controller.

Does "Bring your own virtual network" means that we can provide a pre-deployed VNET with VNET peerings and route tables for the AKS auto deployment?

Yes.