Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.96k stars 306 forks source link

AKS PUT operations expect empty configuration #2920

Open jackfrancis opened 2 years ago

jackfrancis commented 2 years ago

What happened: If I create a cluster with no opinionated configuration for certain immutable properties (e.g., nodepool priority) then subsequent PUT requests against that cluster (or nodepool if we use the priority example) enforce that "non-opinionated configuration" forever after (in the case of priority the validation is "", or empty string).

What you expected to happen: I would expect the validation of immutable configuration to compare to the actual configuration value that the cluster or node pool is running with, to guarantee that no changes will occur. For example, if the actual nodepool priority configuration is "Regular" (the default value applied by AKS when the node pool was created with no user-opinionated priority configuration), then I should be able to formulate node pool API operations using that includes "priority": "Regular" as part of the node pool object representation.

How to reproduce it (as minimally and precisely as possible):

  1. Create a user node pool and specify no priority configuration for the pool.2.
  2. Issue a follow-up API operation (e.g., to update the replica count) but include "priority": "Regular" (or the actual declaration that the API spec defines) in the body data that represents the node pool.3.
  3. Observe that the AKS API rejects the request because it expects priority to have the value "" (empty string).

Anything else we need to know?:

Environment:

ghost commented 2 years ago

Hi jackfrancis, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such: 1) If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster. 2) Please abide by the AKS repo Guidelines and Code of Conduct. 3) If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics? 4) Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS. 5) Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue. 6) If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

jackfrancis commented 2 years ago

cc @zmalik @CecileRobertMichon

jackfrancis commented 2 years ago

@zmalik can you critique my repro steps in the description above? Since you've seen this happen in your environment, can you share exactly what you're observing?

Thanks!

paulgmiller commented 2 years ago

Did some research or api will actually take any value but will only return ""/null or "spot". Since our api // public docs declare regular as option this is just bug we should fix. https://docs.microsoft.com/en-us/rest/api/aks/agent-pools/create-or-update#scalesetpriority

An open question is wether we backfill all non spot clusters to Regular.

ghost commented 2 years ago

Triage required from @Azure/aks-pm

zmalik commented 2 years ago

@zmalik can you critique my repro steps in the description above? Since you've seen this happen in your environment, can you share exactly what you're observing?

@jackfrancis the steps described above are pretty accurate. The go-client used for the operation was github.com/Azure/azure-sdk-for-go/services/containerservice/mgmt/2021-05-01/containerservice.

The exact error by API is:

Code=PropertyChangeNotAllowed Message=Changing property 'properties.ScaleSetPriority' is not allowed. Target=properties.ScaleSetPriority

Since our api // public docs declare regular as option this is just bug we should fix.

My question is as this is one of the immutable variables, will the fix will allow existing workflows to continue?

Would Changing property 'properties.ScaleSetPriority' is not allowed hit us when AKS API returns the correct default value? As now we are changing from Regular to empty?

An open question is wether we backfill all non spot clusters to Regular.

As long as we can use empty and Regular values as the same values when doing updates. It should be fine for any client of AKS API, including CAPZ or Terraform. This is mostly for any retro compatibility.

jackfrancis commented 2 years ago

thanks @zmalik!

Have we observed similar behaviors on any other properties, or just properties.ScaleSetPriority?

zmalik commented 2 years ago

as a blocker only in properties.ScaleSetPriority As the list of all immutable properties isn't visible it's not clear to come up with all properties to test.

Another property properties.enableNodePublicIP which is supposed to be by default false also returns null by default. But if we update it to false it doesn't return any error. But it continues to be null. Which has to be handled on client-side if we don't want to see it as a drift.

If we create an agentpool with properties.enableNodePublicIP false, it does populate correctly. The same applies for properties like properties.scaleDownMode which is null but behaves as Delete. But these cases aren't blocking any user flow as in ScaleSetPriority

I think this boils down to all possible fields where the default is populated as null but its interpretation is different. properties.enableNodePublicIP, null behaves as false properties.ScaleSetPriority, null behaves as Regular properties.scaleDownMode, null behaves as Delete