Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 308 forks source link

[Question] Upgrading AKS #3087

Closed shyam-ks closed 9 months ago

shyam-ks commented 2 years ago

When I am upgrading AKS using ARM or Bicep, only the control plane is getting upgraded, node pool is not getting upgraded. Is this an expected behavior AKS was created using 1.23.5. Reran this template with 1.23.8. The Control plane got upgraded, but the node group is still in 1.23.5

resource aksClusterName_resource 'Microsoft.ContainerService/managedClusters@2021-02-01' = {
  location: location
  name: aksClusterName
  properties: {
    kubernetesVersion: kubernetesVersion
    enableRBAC: enableRBAC
    dnsPrefix: '${aksClusterName}-dns'
    nodeResourceGroup: aksRgName
    agentPoolProfiles: [
      {
        name: 'systempool01'
        osDiskSizeGB: osDiskSizeGB
        count: 3
        enableAutoScaling: true
        minCount: 3
        maxCount: 5
        vmSize: systemPoolVmsize
        osType: 'Linux'
        type: 'VirtualMachineScaleSets'
        enableNodePublicIP: false
        enableEncryptionAtHost: hostBasedEncryption
        mode: 'System'
        maxPods: systemPoolMaxPods
        nodeTaints: [
          'CriticalAddonsOnly=true:NoSchedule'
        ]
        scaleSetPriority: 'Regular'
        scaleSetEvictionPolicy: 'Delete'
        availabilityZones: [
          '1'
          '2'
          '3'
        ]
        vnetSubnetID: vnetSubnetID
      }
      {
        name: 'userpool01'
        osDiskSizeGB: osDiskSizeGB
        count: 5
        enableAutoScaling: true
        minCount: 5
        maxCount: 12
        vmSize: userPoolVmsize
        osType: 'Linux'
        type: 'VirtualMachineScaleSets'
        mode: 'User'
        maxPods: userPoolMaxPods
        scaleSetPriority: 'Regular'
        scaleSetEvictionPolicy: 'Delete'
        enableNodePublicIP: false
        enableEncryptionAtHost: hostBasedEncryption
        availabilityZones: [
          '1'
          '2'
          '3'
        ]
        vnetSubnetID: vnetSubnetID
      }
      {
        name: 'userpool02'
        osDiskSizeGB: osDiskSizeGB
        count: 5
        enableAutoScaling: true
        minCount: 5
        maxCount: 12
        vmSize: userPoolVmsize
        osType: 'Linux'
        type: 'VirtualMachineScaleSets'
        mode: 'User'
        maxPods: userPoolMaxPods
        scaleSetPriority: 'Regular'
        scaleSetEvictionPolicy: 'Delete'
        enableNodePublicIP: false
        enableEncryptionAtHost: hostBasedEncryption
        availabilityZones: [
          '1'
          '2'
          '3'
        ]
        vnetSubnetID: vnetSubnetID
      }
    ]
    networkProfile: {
      loadBalancerSku: 'standard'
      networkPlugin: networkPlugin
      networkPolicy: networkPolicy
      serviceCidr: serviceCidr
      loadBalancerProfile: json('null')
      dnsServiceIP: dnsServiceIP
      outboundType: 'userDefinedRouting'
      dockerBridgeCidr: dockerBridgeCidr
    }
    aadProfile: {
      managed: true
      adminGroupObjectIDs: adminGroupObjectIDs
    }
    apiServerAccessProfile: {
      enablePrivateCluster: enablePrivateCluster
      privateDNSZone: privateDnsZone
    }
    addonProfiles: {
      httpApplicationRouting: {
        enabled: enableHttpApplicationRouting
      }
      azurepolicy: {
        enabled: enableAzurePolicy
      }
      omsagent: {
        enabled: true
        config: {
          logAnalyticsWorkspaceResourceID: workspaceResourceId
        }
      }
    }
  }
  tags: {
  }
  identity: {
    type: 'UserAssigned'
    userAssignedIdentities: {
      '${userAManagedIdentity}': {
      }
    }
  }
  dependsOn: []
}
ghost commented 2 years ago

Action required from @Azure/aks-pm

ghost commented 2 years ago

Issue needing attention of @Azure/aks-leads

carvido1 commented 2 years ago

Hello.

As part of an AKS upgrade, you need to upgrade the control plane as well as the agent pool profiles. For the control plane you have specified the kubernetesVersion argument. To get the different agent pool profiles upgraded as well you need to add the orchestratorVersion to the different agentPoolProfiles with the k8s version desired. A best practice is to have the control plane and the different agent pools working with the same k8s version.

BR

paulgmiller commented 2 years ago

Yep @carvido1 is correct you can bring them both up in one go or often its preferred to bring up control plane first then each agentpool if you're being cautious. That's what autoupgrade does.

ghost commented 2 years ago

Action required from @Azure/aks-pm

ghost commented 2 years ago

Issue needing attention of @Azure/aks-leads

rehan2908 commented 2 years ago

yes i think we have the same issue here, is there any solution, made a case with MS but no response.

rehan2908 commented 2 years ago

@carvido1 , adding [orchestratorVersion] to NodePool within ARM template is not working, it gives bad request on Azure platform.

carvido1 commented 2 years ago

hello @rehan2908 . Can you provide the arm template ? This way I can take a look. Remove any confidential information if required.

Here you have the arm template schema for an AKS clustermicrosoft.containerservice/managedclusters BR

ghost commented 2 years ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

shashankbarsin commented 1 year ago

@rehan2908 ping on above, can you provide the ARM template with confidential information removed?

wsmelton commented 1 year ago

The obersvation we have found from testing is creating all the node pools using Microsoft.ContainerService/managedClusters has a bug in that this API does not (or is not) allowing changes to the node pool settings. We found this when trying to trigger an upgrade of the system node pool it continued to reference the error below no matter what kubernetesVersion string we passed:

image

We updated our Bicep to then deploy the initial system node pool since that has to happen, and then also deploy the user nodes and the system node using the resource Microsoft.ContainerService/managedClusters/agentPools. This worked as expected.

Using @shyam-ks code adjust this to something like the following and it should start working for you.

var nodePoolProfiles = [
  {
    name: 'systempool01'
    osDiskSizeGB: osDiskSizeGB
    count: 3
    enableAutoScaling: true
    minCount: 3
    maxCount: 5
    vmSize: systemPoolVmsize
    osType: 'Linux'
    type: 'VirtualMachineScaleSets'
    enableNodePublicIP: false
    enableEncryptionAtHost: hostBasedEncryption
    mode: 'System'
    maxPods: systemPoolMaxPods
    nodeTaints: [
      'CriticalAddonsOnly=true:NoSchedule'
    ]
    scaleSetPriority: 'Regular'
    scaleSetEvictionPolicy: 'Delete'
    availabilityZones: [
      '1'
      '2'
      '3'
    ]
    vnetSubnetID: vnetSubnetID
  }
  {
    name: 'userpool01'
    osDiskSizeGB: osDiskSizeGB
    count: 5
    enableAutoScaling: true
    minCount: 5
    maxCount: 12
    vmSize: userPoolVmsize
    osType: 'Linux'
    type: 'VirtualMachineScaleSets'
    mode: 'User'
    maxPods: userPoolMaxPods
    scaleSetPriority: 'Regular'
    scaleSetEvictionPolicy: 'Delete'
    enableNodePublicIP: false
    enableEncryptionAtHost: hostBasedEncryption
    availabilityZones: [
      '1'
      '2'
      '3'
    ]
    vnetSubnetID: vnetSubnetID
  }
  {
    name: 'userpool02'
    osDiskSizeGB: osDiskSizeGB
    count: 5
    enableAutoScaling: true
    minCount: 5
    maxCount: 12
    vmSize: userPoolVmsize
    osType: 'Linux'
    type: 'VirtualMachineScaleSets'
    mode: 'User'
    maxPods: userPoolMaxPods
    scaleSetPriority: 'Regular'
    scaleSetEvictionPolicy: 'Delete'
    enableNodePublicIP: false
    enableEncryptionAtHost: hostBasedEncryption
    availabilityZones: [
      '1'
      '2'
      '3'
    ]
    vnetSubnetID: vnetSubnetID
  }
]
resource aksClusterName_resource 'Microsoft.ContainerService/managedClusters@2021-02-01' = {
  location: location
  name: aksClusterName
  properties: {
    kubernetesVersion: kubernetesVersion
    enableRBAC: enableRBAC
    dnsPrefix: '${aksClusterName}-dns'
    nodeResourceGroup: aksRgName
    agentPoolProfiles: nodePoolProfiles
    networkProfile: {
      loadBalancerSku: 'standard'
      networkPlugin: networkPlugin
      networkPolicy: networkPolicy
      serviceCidr: serviceCidr
      loadBalancerProfile: json('null')
      dnsServiceIP: dnsServiceIP
      outboundType: 'userDefinedRouting'
      dockerBridgeCidr: dockerBridgeCidr
    }
    aadProfile: {
      managed: true
      adminGroupObjectIDs: adminGroupObjectIDs
    }
    apiServerAccessProfile: {
      enablePrivateCluster: enablePrivateCluster
      privateDNSZone: privateDnsZone
    }
    addonProfiles: {
      httpApplicationRouting: {
        enabled: enableHttpApplicationRouting
      }
      azurepolicy: {
        enabled: enableAzurePolicy
      }
      omsagent: {
        enabled: true
        config: {
          logAnalyticsWorkspaceResourceID: workspaceResourceId
        }
      }
    }
  }
  tags: {
  }
  identity: {
    type: 'UserAssigned'
    userAssignedIdentities: {
      '${userAManagedIdentity}': {
      }
    }
  }
  dependsOn: []
}
resource agentPools 'Microsoft.ContainerService/managedClusters/agentPools@2022-09-02-preview' = [for (pool, index) in nodePoolProfiles: {
  name: pool.name
  parent: aksClusterName_resource 
  properties: {
    name: pool.name
    osDiskSizeGB: pool.osDiskSizeGB
    count: pool.count
    enableAutoScaling: pool.enableAutoScaling
    minCount: pool.minCount
    maxCount: pool.maxCount
    vmSize: pool.vmSize
    osType: pool.osType
    type: pool.type
    mode: pool.mode
    maxPods: pool.maxPods
    scaleSetPriority: pool.scaleSetPriority
    scaleSetEvictionPolicy: pool.scaleSetEvictionPolicy
    enableNodePublicIP: pool.enableNodePublicIp
    enableEncryptionAtHost: pool.enableEncryptionAtHost
    availabilityZones: pool.availabilityZones
    vnetSubnetID: pool.vnetSubnetId
  }
}]
paulgmiller commented 1 year ago

I think i found the correlationid for your failed request The put mc has this for clsuter version properties":{"kubernetesVersion":"1.24.9"

but there was also a single nodepool npsystem01 with "orchestratorVersion": "1.24",

Mind sharing out the arm template generated by this bicep file I am not that familiar with how bicep expands.

wsmelton commented 1 year ago

Correlation ID: 3628b1c5-c8c5-4f39-89af-af84c59c4441

deployment_operations.txt deployment.txt template.txt

ghost commented 1 year ago

Action required from @Azure/aks-pm

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

PixelRobots commented 1 year ago

I have this working in bicep. If people are still interested I can share it.

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

ghost commented 1 year ago

Issue needing attention of @Azure/aks-leads

chrisgibbons4 commented 1 year ago

I have this working in bicep. If people are still interested I can share it.

Yes please, can you share this??

ramizraza504 commented 1 year ago

I have this working in bicep. If people are still interested I can share it.

can you share please ?

PixelRobots commented 1 year ago

I have this working in bicep. If people are still interested I can share it.

can you share please ?

Here you go. https://github.com/PixelRobots/AKS-Bicep/blob/main/Examples/update-cluster.md

microsoft-github-policy-service[bot] commented 9 months ago

This issue has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs within 15 days of this comment.

microsoft-github-policy-service[bot] commented 9 months ago

Issue needing attention of @Azure/aks-leads

microsoft-github-policy-service[bot] commented 9 months ago

This issue will now be closed because it hasn't had any activity for 7 days after stale. shyam-ks feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.