Azure / terraform-azurerm-aks

Terraform Module for deploying an AKS cluster
MIT License
359 stars 467 forks source link

"azapi_update_resource" "aks_cluster_post_create" Failed to retrieve resource #580

Closed willie-yao closed 3 months ago

willie-yao commented 3 months ago

Is there an existing issue for this?

Greenfield/Brownfield provisioning

greenfield

Terraform Version

1.9.3

Module Version

9.1.0

AzureRM Provider Version

3.113.0

Affected Resource(s)/Data Source(s)

azapi_update_resource, aks_cluster_post_create

Terraform Configuration Files

################################################################################
# Virtual Network: Module
################################################################################

module "network" {
  source              = "Azure/subnets/azurerm"
  version             = "1.0.0"
  resource_group_name = azurerm_resource_group.this.name
  subnets = {
    aks = {
      address_prefixes  = ["10.52.0.0/16"]
      service_endpoints = ["Microsoft.Storage"]
    }
  }
  virtual_network_address_space = ["10.52.0.0/16"]
  virtual_network_location      = azurerm_resource_group.this.location
  virtual_network_name          = "vnet1"
  virtual_network_tags          = var.tags
}

################################################################################
# AKS: Module
################################################################################

module "aks" {
  source                            = "Azure/aks/azurerm"
  version                           = "9.1.0"
  resource_group_name               = azurerm_resource_group.this.name
  location                          = var.location
  kubernetes_version                = var.kubernetes_version
  orchestrator_version              = var.kubernetes_version
  role_based_access_control_enabled = var.role_based_access_control_enabled
  rbac_aad                          = var.rbac_aad
  prefix                            = var.prefix
  network_plugin                    = var.network_plugin
  vnet_subnet_id                    = lookup(module.network.vnet_subnets_name_id, "aks")
  os_disk_size_gb                   = var.os_disk_size_gb
  sku_tier                          = var.sku_tier
  private_cluster_enabled           = var.private_cluster_enabled
  enable_auto_scaling               = var.enable_auto_scaling
  enable_host_encryption            = var.enable_host_encryption
  log_analytics_workspace_enabled   = var.log_analytics_workspace_enabled
  agents_min_count                  = var.agents_min_count
  agents_max_count                  = var.agents_max_count
  agents_count                      = null # Please set `agents_count` `null` while `enable_auto_scaling` is `true` to avoid possible `agents_count` changes.
  agents_max_pods                   = var.agents_max_pods
  agents_pool_name                  = "system"
  agents_availability_zones         = ["1", "2", "3"]
  agents_type                       = "VirtualMachineScaleSets"
  agents_size                       = var.agents_size
  monitor_metrics                   = {}
  azure_policy_enabled              = var.azure_policy_enabled
  microsoft_defender_enabled        = var.microsoft_defender_enabled
  tags                              = var.tags

  workload_identity_enabled = true
  oidc_issuer_enabled       = true

  agents_labels = {
    "nodepool" : "defaultnodepool"
  }

  agents_tags = {
    "Agent" : "defaultnodepoolagent"
  }

  network_policy             = var.network_policy
  net_profile_dns_service_ip = var.net_profile_dns_service_ip
  net_profile_service_cidr   = var.net_profile_service_cidr

  network_contributor_role_assigned_subnet_ids = { "aks" = lookup(module.network.vnet_subnets_name_id, "aks") }

  depends_on = [module.network]
}

tfvars variables values

# Azure region
location = "westus3"

# Kubernetes version
kubernetes_version = null # Defaults to latest

# GitOps Addons configuration
gitops_addons_org      = "git@github.com:myGitHubUserName"
gitops_addons_repo     = "aks-platform-engineering"
gitops_addons_basepath = "gitops/"
gitops_addons_path     = "bootstrap/control-plane/addons"
gitops_addons_revision = "capzmanual"

# Agents size
agents_size = "Standard_D2s_v3"

# Addons configuration
addons = {
  enable_kyverno                         = false
}

# Resource group name
resource_group_name = "aks-gitops"

Debug Output/Panic Output

module.aks.azurerm_role_assignment.network_contributor_on_subnet["aks"]: Still creating... [10s elapsed]
module.aks.azurerm_role_assignment.network_contributor_on_subnet["aks"]: Still creating... [20s elapsed]
module.aks.azurerm_role_assignment.network_contributor_on_subnet["aks"]: Creation complete after 25s [id=/subscriptions/<sub-id>/resourceGroups/aks-gitops/providers/Microsoft.Network/virtualNetworks/vnet1/subnets/aks/providers/Microsoft.Authorization/roleAssignments/d0c8fc6e-c5ae-4549-bda3-3ff923b95f40]
╷
│ Warning: Deprecated attribute
│ 
│   on .terraform/modules/aks/main.tf line 552, in resource "azurerm_kubernetes_cluster" "main":
│  552:       public_network_access_enabled,
│ 
│ The attribute "public_network_access_enabled" is deprecated. Refer to the provider documentation for details.
╵
╷
│ Warning: Argument is deprecated
│ 
│   with module.network.azurerm_subnet.subnet["aks"],
│   on .terraform/modules/network/main.tf line 35, in resource "azurerm_subnet" "subnet":
│   35:   private_endpoint_network_policies_enabled     = var.subnets[each.value].private_endpoint_network_policies_enabled
│ 
│ `private_endpoint_network_policies_enabled` will be removed in favour of the property `private_endpoint_network_policies` in version 4.0 of the AzureRM Provider
│ 
│ (and one more similar warning elsewhere)
╵
╷
│ Error: Failed to retrieve resource
│ 
│   with module.aks.azapi_update_resource.aks_cluster_post_create,
│   on .terraform/modules/aks/main.tf line 641, in resource "azapi_update_resource" "aks_cluster_post_create":
│  641: resource "azapi_update_resource" "aks_cluster_post_create" {
│ 
│ checking for presence of existing Resource: (ResourceId "/subscriptions/<sub-id>/resourceGroups/aks-gitops/providers/Microsoft.ContainerService/managedClusters/gitops-aks" / Api Version "2024-02-01"):
│ ChainedTokenCredential authentication failed
│ GET http://169.254.169.254/metadata/identity/oauth2/token
│ --------------------------------------------------------------------------------
│ RESPONSE 400 Bad Request
│ --------------------------------------------------------------------------------
│ {
│   "error": "invalid_request",
│   "error_description": "Identity not found"
│ }
│ --------------------------------------------------------------------------------
│

Expected Behaviour

The cluster is deployed successfully.

Actual Behaviour

Terraform fails in aks_cluster_post_create. The specific error is ChainedTokenCredential authentication failed. We are using user-assigned identity for authentication and it successfully creates resource groups and the AKS Cluster, so I'm not sure why there's an auth error.

Steps to Reproduce

No response

Important Factoids

No response

References

506 Had a similar error message. @zioproto had a comment that is related. This error only happens for me as I'm running terraform from my dev VM on Azure. @dtzar has this same setup running locally without a problem.

zioproto commented 3 months ago

hello @willie-yao

How your Terraform is authenticating to Azure ? Are you using the Azure Identity of your VM ?

Looking at the provider documentation: https://registry.terraform.io/providers/Azure/azapi/latest/docs

In #506 the solution was setting use_oidc to true (the default is false)

willie-yao commented 3 months ago

@zioproto helped me with this offline. Basically, the azurerm provider and the azapi provider need to use the same authentication, which was what I was missing here. In my case I was using Azure CLI authentication with Terraform/azurerm. Therefore, I had to set use_oidc=true as well as use_msi=false since azureapi will use msi by default.