Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 307 forks source link

[Feature] Support for immutable node images in nodepools #3419

Open the-technat opened 1 year ago

the-technat commented 1 year ago

Is your feature request related to a problem? Please describe.

We at @swisspost miss the possibilty to pin AKS worker nodes to a specifiy immutable node image when creating a nodepool. AKS nodepools don't support specifying an image ID of a pre-built AKS worker node image.

Describe the solution you'd like Ability to specify an image ID (image pre-built by Azure) when creating a nodepool so that all nodes in this nodepool contain the same patch-level, even if new nodes are added after the nodepool was created. Upgrades of the nodes are done by specifying a new image, so that nodes are replaced through new nodes that boot the new image, not by running unattended-upgrade on the nodes.

Images ID's should be generated every time a new image is baked, so that they identify a specific patch-level on a specific OS version.

Describe alternatives you've considered We're currently solving this using the automatic upgrades that are preconfigured on AKS and kured to boot the nodes when needed (which is proven to not work, see #1773), but since the image ID isn't specified and nodes could scale up/down as needed we could end up with nodes that have different patch-levels based on when they were started. So it's not deterministic and that makes it hard to debug things.

Another side-affect of this is that nodes need internet-connectivity to load patches which in our environment means we need to configure a proxy for APT to reach the internet through ExpressRoute...

Additional context Somewhat related to #3151 but it's not simply for the upgrade command but we;d like the feature in general, maybe as an alternative to unattended-upgrades that's updating the nodes.

/cc @zioproto

zioproto commented 1 year ago

Open questions:

the-technat commented 1 year ago

Nodes-images for both Ubuntu and Mariner V2 would be super fancy, so the the user can choose what OS he wants. But other than that it doesn't really matter, currently the user also has no choice in selecting the OS (and if we're honest you rarely log into a K8s worker node) so the OS really doesn't matter.

zioproto commented 1 year ago

Related to https://github.com/Azure/AKS/issues/2181

I am checking if NodeImage option of the new nodeOSUpgradeChannel would solve this issue.

The preview feature is already published and the docs are here: https://learn.microsoft.com/en-us/azure/aks/auto-upgrade-node-image

the-technat commented 1 year ago

I've read through the docs twice and seems like NodeImage could indeed solve this issue, at least partially. As I understand it correctly by setting nodeOSUpgradeChannel to NodeImage the nodes are replaced with newer nodes during a specified maintenance window? e.g if we set the maintenance window to be every Thursday between 10:00 AM and 15:00 PM for example, AKS will replace nodes with newer nodes during this time if there is a new image available? And in addition, new nodes brought up by the cluster-autoscaler would also use the newest image availble?

If so, this would be a great feature. Still what I feel like it's missing is an option to pin the image to a specific one to leave the decision when to upgrade to the user.

palma21 commented 1 year ago

*nodeOSUpsgrade channel to SecurityPatch is what's being introduced.

node image is the same behavior that is already available for several years now. And yes it does behave like that. https://learn.microsoft.com/en-us/azure/aks/auto-upgrade-cluster#using-cluster-auto-upgrade

On your last ask, I'm not sure I follow. The node image doesn't change unless you upgrade it or have auto upgrade, so wouldn't just do this achieve what you want?

the-technat commented 1 year ago

@palma21 not necessarily. I can control when the nodes get updated with a new image, but I can't control which image this is as there are only channels to chose from. So depending on when you upgrade you get another image and you also can't upgrade to something that's not latest (according to each channels specification) nor keep the nodes on a specific image for a long period of time.

What I really miss is just a parameter where I can specify the VM image that should be used for the nodes in a nodepool, just as you could do this with a normal virtual machine in Azure, expect that there you can only chose the OS + OS version, not the exact image version (e.g an immutable image ID that will always be the same VM image).

Does this clarify the feature request?

the-technat commented 1 year ago

Any updates on this?

palma21 commented 1 year ago

We're not planning a parameter for you to specify an arbitrary image per se since only the service knows what images will work which each specific versions of itself (which change weekly), so that could regress quality of service accross operations.

What's your need/use case to specify the specific image?

palma21 commented 1 year ago

Also to the questions:

e.g if we set the maintenance window to be every Thursday between 10:00 AM and 15:00 PM for example, AKS will replace nodes with newer nodes during this time if there is a new image available?

Yes

And in addition, new nodes brought up by the cluster-autoscaler would also use the newest image availble?

Yes

If so, this would be a great feature. Still what I feel like it's missing is an option to pin the image to a specific one to leave the decision when to upgrade to the user.

per above would love to understand why you'd need to pin the image in that case. If it's the above, it might be that you just want upgrade groups to take care of image consistency for you automatically

kaarthis commented 7 months ago

Could you please answer why you need to pin the image.

the-technat commented 7 months ago

Sorry for not having answered for a long time.

It's mostly about immutability and declaritive configuration. We update everything via Terraform and want to see in the HCL code when the ID of an image changes, so that one can use that ID and check what software was in that image.

I understand that some images won't work with the AKS service and pinning the nodes to a specific image for a longer period of time can cause problems, but aren't the nodes in the full responsibility of the customer? From what I've seen so far in AKS I doubt and that might be my problem. I'm used to Kubernetes services where the nodes are the responsibility of the customer and choosing an old or wrong image will result in nodes not joining the cluster, but it allowes the customer to use basically any OS where there's an image for. I guess I've just applied these principles to AKS as well and that's why I'm confused it doesn't exist here.

miwithro commented 7 months ago

@the-technat AKS nodes are a shared responsibility between AKS and the customer. The version used is the responsibility of the customer. AKS is responsible for delivering versions that customers can consume.

miwithro commented 7 months ago

@palma21 not necessarily. I can control when the nodes get updated with a new image, but I can't control which image this is as there are only channels to chose from. So depending on when you upgrade you get another image and you also can't upgrade to something that's not latest (according to each channels specification) nor keep the nodes on a specific image for a long period of time.

What I really miss is just a parameter where I can specify the VM image that should be used for the nodes in a nodepool, just as you could do this with a normal virtual machine in Azure, expect that there you can only chose the OS + OS version, not the exact image version (e.g an immutable image ID that will always be the same VM image).

Does this clarify the feature request?

@kaarthis above is the feature request from the customer.