Risk of Terraform state drift when storage_profile_blob_driver_enabled is true

zioproto commented 1 year ago

Introduction

When storage_profile_blob_driver_enabled is True, the CSI driver running in the AKS cluster will create a Service Endpoint "Microsoft.Storage" as soon as the first PersistentVolumeClaim is created. This change done by the CSI driver is not tracked in the Terraform state and causes stage drift. Because of the dependencies in the modules this state drift causes the destroy and creation of a new cluster.

Is there an existing issue for this?

[X] I have searched the existing issues

Greenfield/Brownfield provisioning

greenfield

Terraform Version

1.5.5

Module Version

7.3.0

AzureRM Provider Version

v3.68.0

Affected Resource(s)/Data Source(s)

azurerm_kubernetes_cluster

Terraform Configuration Files

# Minimal code to explain the issue
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.56"
    }
  }
  required_version = ">= 1.1.0"
}

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "example" {
  name     = "testResourceGroup"
  location = "eastus"
}

module "network" {
  source              = "Azure/network/azurerm"
  vnet_name           = azurerm_resource_group.example.name
  resource_group_name = azurerm_resource_group.example.name
  address_space       = "10.52.0.0/16"
  subnet_prefixes     = ["10.52.0.0/16"]
  subnet_names        = ["system"]
  use_for_each        = true
  depends_on          = [azurerm_resource_group.example]
}

resource azurerm_role_assignment "aks" {
  scope                = module.network.vnet_id
  role_definition_name = "Network Contributor"
  principal_id         = azurerm_kubernetes_cluster.example.identity[0].principal_id
}

resource "azurerm_kubernetes_cluster" "example" {
  name                = "example-aks1"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  dns_prefix          = "exampleaks1"

  default_node_pool {
    name           = "default"
    node_count     = 1
    vm_size        = "Standard_DS3_v2"
    vnet_subnet_id = module.network.vnet_subnets[0]
  }

  identity {
    type = "SystemAssigned"
  }

  storage_profile {

    blob_driver_enabled = true
  }

}

tfvars variables values

N/A

Debug Output/Panic Output

% terraform apply                  
azurerm_resource_group.example: Refreshing state... [id=/subscriptions/REDACTED/resourceGroups/testResourceGroup]
module.network.data.azurerm_resource_group.network[0]: Reading...
module.network.data.azurerm_resource_group.network[0]: Read complete after 1s [id=/subscriptions/REDACTED/resourceGroups/testResourceGroup]
module.network.azurerm_virtual_network.vnet: Refreshing state... [id=/subscriptions/REDACTED7/resourceGroups/testResourceGroup/providers/Microsoft.Network/virtualNetworks/testResourceGroup]
module.network.azurerm_subnet.subnet_for_each["system"]: Refreshing state... [id=/subscriptionsREDACTED/resourceGroups/testResourceGroup/providers/Microsoft.Network/virtualNetworks/testResourceGroup/subnets/system]
azurerm_kubernetes_cluster.example: Refreshing state... [id=/subscriptions/REDACTED/resourceGroups/testResourceGroup/providers/Microsoft.ContainerService/managedClusters/example-aks1]
azurerm_role_assignment.aks: Refreshing state... [id=/subscriptions/REDACTED/resourceGroups/testResourceGroup/providers/Microsoft.Network/virtualNetworks/testResourceGroup/providers/Microsoft.Authorization/roleAssignments/918a9acd-be5c-8f85-6b01-8866e927e2e2]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # module.network.azurerm_subnet.subnet_for_each["system"] will be updated in-place
  ~ resource "azurerm_subnet" "subnet_for_each" {
        id                                             = "/subscriptions/REDACTED/resourceGroups/testResourceGroup/providers/Microsoft.Network/virtualNetworks/testResourceGroup/subnets/system"
        name                                           = "system"
      ~ service_endpoints                              = [
          - "Microsoft.Storage",
        ]
        # (8 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Expected Behaviour

Terraform plan should show no changes

Actual Behaviour

Terraform will perform the following actions:

  # module.network.azurerm_subnet.subnet_for_each["system"] will be updated in-place
  ~ resource "azurerm_subnet" "subnet_for_each" {
        id                                             = "/subscriptions/REDACTED7/resourceGroups/testResourceGroup/providers/Microsoft.Network/virtualNetworks/testResourceGroup/subnets/system"
        name                                           = "system"
      ~ service_endpoints                              = [
          - "Microsoft.Storage",
        ]
        # (8 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Steps to Reproduce

terraform apply
az aks get-credentials --resource-group testResourceGroup --name example-aks1

apply the following with kubectl:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: echoserver-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: azureblob-nfs-premium
  resources:
    requests:
      storage: 10Gi

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: echoserver
spec:
  replicas: 1
  selector:
    matchLabels:
      run: echoserver
  template:
    metadata:
      labels:
        run: echoserver
    spec:
      volumes:
      - name: volume
        persistentVolumeClaim:
          claimName: echoserver-pvc
      containers:
      - name: echoserver
        image: gcr.io/google_containers/echoserver:1.10
        imagePullPolicy: Always
        volumeMounts:
        - mountPath: "/data"
          name: volume
        ports:
        - containerPort: 8080
        readinessProbe:
          tcpSocket:
            port: 8080
          initialDelaySeconds: 6
          periodSeconds: 10

Once the Pod is running the Terraform state has drifted, run terraform again to confirm.

Important Factoids

No response

References

No response

zioproto commented 1 year ago

To reproduce using this module this is the minimal code:

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.56"
    }
  }
  required_version = ">= 1.1.0"
}

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "example" {
  name     = "testResourceGroup2"
  location = "eastus"
}

module "network" {
  source              = "Azure/network/azurerm"
  vnet_name           = azurerm_resource_group.example.name
  resource_group_name = azurerm_resource_group.example.name
  address_space       = "10.52.0.0/16"
  subnet_prefixes     = ["10.52.0.0/16"]
  subnet_names        = ["system"]
  use_for_each        = true
  depends_on          = [azurerm_resource_group.example]
}

module "aks" {
  source                            = "Azure/aks/azurerm"
  version                           = "7.3.0"
  resource_group_name               = azurerm_resource_group.example.name
  role_based_access_control_enabled = true
  rbac_aad                          = false
  prefix                            = "aks"
  network_plugin                    = "azure"
  vnet_subnet_id                    = module.network.vnet_subnets[0]
  os_disk_size_gb                   = 50
  sku_tier                          = "Standard"
  private_cluster_enabled           = false
  enable_auto_scaling               = true
  agents_min_count                  = 1
  agents_max_count                  = 5
  agents_count                      = null # Please set `agents_count` `null` while `enable_auto_scaling` is `true` to avoid possible `agents_count` changes.
  agents_max_pods                   = 100
  agents_pool_name                  = "system"
  agents_availability_zones         = ["1", "2"]
  agents_type                       = "VirtualMachineScaleSets"
  agents_size                       = "Standard_DS3_v2"

  ingress_application_gateway_enabled = false

  network_policy                 = "azure"
  net_profile_dns_service_ip     = "10.0.0.10"
  net_profile_service_cidr       = "10.0.0.0/16"

  storage_profile_enabled = true
  storage_profile_blob_driver_enabled = true

  network_contributor_role_assigned_subnet_ids =  { "system" = module.network.vnet_subnets[0]}

  depends_on = [module.network]
}

and this is the Terraform state drift after creating the first PersistentVolumeClaim:

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:

  # module.aks.azurerm_kubernetes_cluster.main has changed
  ~ resource "azurerm_kubernetes_cluster" "main" {
        id                                  = "/subscriptions/REDACTED/resourceGroups/testResourceGroup2/providers/Microsoft.ContainerService/managedClusters/aks-aks"
        name                                = "aks-aks"
        # (27 unchanged attributes hidden)

      ~ identity {
          + identity_ids = []
            # (3 unchanged attributes hidden)
        }

        # (7 unchanged blocks hidden)
    }

Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to
undo or respond to these changes.

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place
-/+ destroy and then create replacement
+/- create replacement and then destroy
 <= read (data resources)

Terraform will perform the following actions:

  # module.aks.data.azurerm_resource_group.main will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "azurerm_resource_group" "main" {
      + id         = (known after apply)
      + location   = (known after apply)
      + managed_by = (known after apply)
      + name       = "testResourceGroup2"
      + tags       = (known after apply)
    }

  # module.aks.azurerm_kubernetes_cluster.main must be replaced
+/- resource "azurerm_kubernetes_cluster" "main" {
      ~ api_server_authorized_ip_ranges     = [] -> (known after apply)
      - custom_ca_trust_certificates_base64 = [] -> null
      - enable_pod_security_policy          = false -> null
      ~ fqdn                                = "aks-r8glnvzu.hcp.eastus.azmk8s.io" -> (known after apply)
      + http_application_routing_zone_name  = (known after apply)
      ~ id                                  = "/subscriptions/REDACTED/resourceGroups/testResourceGroup2/providers/Microsoft.ContainerService/managedClusters/aks-aks" -> (known after apply)
      ~ kube_admin_config                   = (sensitive value)
      + kube_admin_config_raw               = (sensitive value)
      ~ kube_config                         = (sensitive value)
      ~ kube_config_raw                     = (sensitive value)
      ~ kubernetes_version                  = "1.26.6" -> (known after apply)
      - local_account_disabled              = false -> null
      ~ location                            = "eastus" # forces replacement -> (known after apply) # forces replacement
        name                                = "aks-aks"
      ~ node_resource_group                 = "MC_testResourceGroup2_aks-aks_eastus" -> (known after apply)
      ~ node_resource_group_id              = "/subscriptions/REDACTED/resourceGroups/MC_testResourceGroup2_aks-aks_eastus" -> (known after apply)
      + oidc_issuer_url                     = (known after apply)
      - open_service_mesh_enabled           = false -> null
      ~ portal_fqdn                         = "aks-r8glnvzu.portal.hcp.eastus.azmk8s.io" -> (known after apply)
      + private_dns_zone_id                 = (known after apply)
      + private_fqdn                        = (known after apply)
      - tags                                = {} -> null
        # (14 unchanged attributes hidden)

      - auto_scaler_profile {
          - balance_similar_node_groups      = false -> null
          - empty_bulk_delete_max            = "10" -> null
          - expander                         = "random" -> null
          - max_graceful_termination_sec     = "600" -> null
          - max_node_provisioning_time       = "15m" -> null
          - max_unready_nodes                = 3 -> null
          - max_unready_percentage           = 45 -> null
          - new_pod_scale_up_delay           = "0s" -> null
          - scale_down_delay_after_add       = "10m" -> null
          - scale_down_delay_after_delete    = "10s" -> null
          - scale_down_delay_after_failure   = "3m" -> null
          - scale_down_unneeded              = "10m" -> null
          - scale_down_unready               = "20m" -> null
          - scale_down_utilization_threshold = "0.5" -> null
          - scan_interval                    = "10s" -> null
          - skip_nodes_with_local_storage    = false -> null
          - skip_nodes_with_system_pods      = true -> null
        }

      ~ default_node_pool {
          - custom_ca_trust_enabled      = false -> null
          - fips_enabled                 = false -> null
          ~ kubelet_disk_type            = "OS" -> (known after apply)
            name                         = "system"
          ~ node_count                   = 1 -> (known after apply)
          ~ node_labels                  = {} -> (known after apply)
          - node_taints                  = [] -> null
          - only_critical_addons_enabled = false -> null
          ~ orchestrator_version         = "1.26.6" -> (known after apply)
          ~ os_sku                       = "Ubuntu" -> (known after apply)
          - tags                         = {} -> null
          + workload_runtime             = (known after apply)
            # (14 unchanged attributes hidden)
        }

      ~ identity {
          - identity_ids = [] -> null
          ~ principal_id = "53cd215f-ef23-4b11-98ab-2ff5dbd27ff7" -> (known after apply)
          ~ tenant_id    = "72f988bf-86f1-41af-91ab-2d7cd011db47" -> (known after apply)
            # (1 unchanged attribute hidden)
        }

      - kubelet_identity {
          - client_id                 = "16265252-309a-461b-904b-5fcf3edebc89" -> null
          - object_id                 = "850bb50d-e395-40c5-97be-4a6ea3913604" -> null
          - user_assigned_identity_id = "/subscriptions/REDACTED/resourceGroups/MC_testResourceGroup2_aks-aks_eastus/providers/Microsoft.ManagedIdentity/userAssignedIdentities/aks-aks-agentpool" -> null
        }

      ~ network_profile {
          + docker_bridge_cidr = (known after apply)
          ~ ip_versions        = [
              - "IPv4",
            ] -> (known after apply)
          + network_mode       = (known after apply)
          + pod_cidr           = (known after apply)
          ~ pod_cidrs          = [] -> (known after apply)
          ~ service_cidrs      = [
              - "10.0.0.0/16",
            ] -> (known after apply)
            # (6 unchanged attributes hidden)

          - load_balancer_profile {
              - effective_outbound_ips      = [
                  - "/subscriptions/REDACTED/resourceGroups/MC_testResourceGroup2_aks-aks_eastus/providers/Microsoft.Network/publicIPAddresses/cd9fc956-b3a5-4dc9-9776-5006cab89cd6",
                ] -> null
              - idle_timeout_in_minutes     = 0 -> null
              - managed_outbound_ip_count   = 1 -> null
              - managed_outbound_ipv6_count = 0 -> null
              - outbound_ip_address_ids     = [] -> null
              - outbound_ip_prefix_ids      = [] -> null
              - outbound_ports_allocated    = 0 -> null
            }
        }

      ~ oms_agent {
          ~ log_analytics_workspace_id      = "/subscriptions/REDACTED/resourceGroups/testResourceGroup2/providers/Microsoft.OperationalInsights/workspaces/aks-workspace" -> (known after apply)
          - msi_auth_for_monitoring_enabled = false -> null
          ~ oms_agent_identity              = [
              - {
                  - client_id                 = "1406d013-c601-47d9-ac40-7e30983f6349"
                  - object_id                 = "bd2dd411-b18a-4e1f-91bc-224c0880eb38"
                  - user_assigned_identity_id = "/subscriptions/REDACTED/resourcegroups/MC_testResourceGroup2_aks-aks_eastus/providers/Microsoft.ManagedIdentity/userAssignedIdentities/omsagent-aks-aks"
                },
            ] -> (known after apply)
        }

      - windows_profile {
          - admin_username = "azureuser" -> null
        }

        # (1 unchanged block hidden)
    }

  # module.aks.azurerm_log_analytics_solution.main[0] must be replaced
-/+ resource "azurerm_log_analytics_solution" "main" {
      ~ id                    = "/subscriptions/REDACTED/resourceGroups/testResourceGroup2/providers/Microsoft.OperationsManagement/solutions/ContainerInsights(aks-workspace)" -> (known after apply)
      ~ location              = "eastus" # forces replacement -> (known after apply) # forces replacement
      - tags                  = {} -> null
      ~ workspace_resource_id = "/subscriptions/REDACTED/resourceGroups/testResourceGroup2/providers/Microsoft.OperationalInsights/workspaces/aks-workspace" # forces replacement -> (known after apply) # forces replacement
        # (3 unchanged attributes hidden)

      ~ plan {
          ~ name      = "ContainerInsights(aks-workspace)" -> (known after apply)
            # (2 unchanged attributes hidden)
        }
    }

  # module.aks.azurerm_log_analytics_workspace.main[0] must be replaced
+/- resource "azurerm_log_analytics_workspace" "main" {
      - cmk_for_query_forced               = false -> null
      ~ id                                 = "/subscriptions/REDACTED/resourceGroups/testResourceGroup2/providers/Microsoft.OperationalInsights/workspaces/aks-workspace" -> (known after apply)
      ~ location                           = "eastus" # forces replacement -> (known after apply) # forces replacement
        name                               = "aks-workspace"
      ~ primary_shared_key                 = (sensitive value)
      + reservation_capacity_in_gb_per_day = (known after apply)
      ~ secondary_shared_key               = (sensitive value)
      - tags                               = {} -> null
      ~ workspace_id                       = "674fb72f-ff10-4eb2-8844-7b7190983d77" -> (known after apply)
        # (8 unchanged attributes hidden)
    }

  # module.aks.azurerm_role_assignment.network_contributor_on_subnet["system"] must be replaced
-/+ resource "azurerm_role_assignment" "network_contributor_on_subnet" {
      ~ id                               = "/subscriptions/REDACTED/resourceGroups/testResourceGroup2/providers/Microsoft.Network/virtualNetworks/testResourceGroup2/subnets/system/providers/Microsoft.Authorization/roleAssignments/b9dc6b17-2a56-43d5-715f-ef4fa78d63c4" -> (known after apply)
      ~ name                             = "b9dc6b17-2a56-43d5-715f-ef4fa78d63c4" -> (known after apply)
      ~ principal_id                     = "53cd215f-ef23-4b11-98ab-2ff5dbd27ff7" # forces replacement -> (known after apply) # forces replacement
      ~ principal_type                   = "ServicePrincipal" -> (known after apply)
      ~ role_definition_id               = "/subscriptions/REDACTED/providers/Microsoft.Authorization/roleDefinitions/4d97b98b-1d4f-4787-a291-c67834d212e7" -> (known after apply)
      + skip_service_principal_aad_check = (known after apply)
        # (2 unchanged attributes hidden)
    }

  # module.network.azurerm_subnet.subnet_for_each["system"] will be updated in-place
  ~ resource "azurerm_subnet" "subnet_for_each" {
        id                                             = "/subscriptions/REDACTED/resourceGroups/testResourceGroup2/providers/Microsoft.Network/virtualNetworks/testResourceGroup2/subnets/system"
        name                                           = "system"
      ~ service_endpoints                              = [
          - "Microsoft.Storage",
        ]
        # (8 unchanged attributes hidden)
    }

Plan: 4 to add, 1 to change, 4 to destroy.

zioproto commented 1 year ago

@lonegunmanb I am not sure how to move forward with this.

The state drift is in a resource outside of the terraform-azurerm-aks module, we just pass the ID of the subnet to the module.

Adding the service endpoint in terraform fixes the drift:

module "network" {
  source              = "Azure/network/azurerm"
  vnet_name           = azurerm_resource_group.example.name
  resource_group_name = azurerm_resource_group.example.name
  address_space       = "10.52.0.0/16"
  subnet_prefixes     = ["10.52.0.0/16"]
  subnet_names        = ["system"]
  use_for_each        = true
  depends_on          = [azurerm_resource_group.example]
  subnet_service_endpoints = {
    "system" = ["Microsoft.Storage"]
  }
}

However how can we enforce this when using storage_profile_enabled = true ?

zioproto commented 1 year ago

Next steps:

Test with location explicit in the aks module
Declare a datasource to retrieve the subnet information in case the user is bringing their own subnet. Create a precondition to check if the service endpoint is defined.
Consider using check blocks: https://developer.hashicorp.com/terraform/language/checks

zioproto commented 1 year ago

I confirm that adding location like this:

module "aks" {
  source              = "Azure/aks/azurerm"
  version             = "7.3.0"
  resource_group_name = azurerm_resource_group.example.name
  location            = "eastus"
[..CUT..]

There is still state drift, but the impact is much smaller and does not involve the destroy and create of the cluster:

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:

  # module.aks.azurerm_kubernetes_cluster.main has changed
  ~ resource "azurerm_kubernetes_cluster" "main" {
        id                                  = "/subscriptions/REDACTED/resourceGroups/testResourceGroup2/providers/Microsoft.ContainerService/managedClusters/aks-aks"
        name                                = "aks-aks"
        # (27 unchanged attributes hidden)

      ~ identity {
          + identity_ids = []
            # (3 unchanged attributes hidden)
        }

        # (7 unchanged blocks hidden)
    }

Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to
undo or respond to these changes.

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  ~ update in-place
 <= read (data resources)

Terraform will perform the following actions:

  # module.aks.data.azurerm_resource_group.main will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "azurerm_resource_group" "main" {
      + id         = (known after apply)
      + location   = (known after apply)
      + managed_by = (known after apply)
      + name       = "testResourceGroup2"
      + tags       = (known after apply)
    }

  # module.network.azurerm_subnet.subnet_for_each["system"] will be updated in-place
  ~ resource "azurerm_subnet" "subnet_for_each" {
        id                                             = "/subscriptions/REDACTED/resourceGroups/testResourceGroup2/providers/Microsoft.Network/virtualNetworks/testResourceGroup2/subnets/system"
        name                                           = "system"
      ~ service_endpoints                              = [
          - "Microsoft.Storage",
        ]
        # (8 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

lonegunmanb commented 1 year ago

Next steps:

Test with location explicit in the aks module

Declare a datasource to retrieve the subnet information in case the user is bringing their own subnet. Create a precondition to check if the service endpoint is defined.

Consider using check blocks: https://developer.hashicorp.com/terraform/language/checks

Thanks @zioproto for tracing this issue, a check block would force the module's caller to upgrade their Terraform core version to >= 1.5.0.

If we use check block we can generate a warning, and I've checked the data.azurerm_subnet's document, we can check the service_endpoints.

Another approach is using a datasource block which contains a postcondition block, that would block a subnet without proper service endpoints been used by this Aks module, but would that be an overkill?

zioproto commented 4 months ago

I think using check and generating a warning is okay. Because we should not force the service_endpoints for customers that don't use persistent volumes.

Azure / terraform-azurerm-aks