databricks / terraform-provider-databricks

Databricks Terraform Provider
https://registry.terraform.io/providers/databricks/databricks/latest
Other
454 stars 392 forks source link

[ISSUE] `databricks_metastore_data_access` resource cannot be created even though the Service Principal is an Account Admin #2222

Closed adisaljusi closed 1 year ago

adisaljusi commented 1 year ago

While creating an Unity Catalog, thedatabricks_metastore_data_access fails when applying the plan. The service principal is assigned as an Account Admin on the Azure Databricks access console and databricks_metastore resource was created successfully.

Configuration

terraform {
  backend "azurerm" {
    snapshot = true
  }
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "=3.52.0"
    }
    databricks = {
      source  = "databricks/databricks"
      version = "1.14.3"
    }
  }
}

provider "azurerm" {
  features {}
  environment = "public"
}

provider "databricks" {
  host                        = azurerm_databricks_workspace.lakehouse.workspace_url
  azure_workspace_resource_id = azurerm_databricks_workspace.lakehouse.id
}
data "azurerm_resource_group" "infrastructure" {
  name = "rg-infra"
}

data "azurerm_client_config" "service_connection" {}

resource "azurerm_resource_group" "lakehouse" {
  name     = "rg-lakehouse"
  location = data.azurerm_resource_group.infrastructure.location
}

resource "azurerm_databricks_workspace" "lakehouse" {
  name                        = "dbw-lakehouse"
  resource_group_name         = azurerm_resource_group.lakehouse.name
  location                    = azurerm_resource_group.lakehouse.location
  sku                         = "premium"
  managed_resource_group_name = "rg-dbw-lakehouse"
}

data "databricks_spark_version" "latest" {
  depends_on = [
    azurerm_databricks_workspace.lakehouse
  ]
}
data "databricks_node_type" "smallest" {
  local_disk = true

  depends_on = [
    azurerm_databricks_workspace.lakehouse
  ]
}

resource "databricks_cluster" "unity_sql" {
  cluster_name            = "Cluster"
  spark_version           = data.databricks_spark_version.latest.id
  node_type_id            = data.databricks_node_type.smallest.id
  autotermination_minutes = 10
  enable_elastic_disk     = false
  num_workers             = 2

  data_security_mode = "USER_ISOLATION"

  azure_attributes {
    availability = "SPOT"
  }
}

resource "azurerm_storage_account" "adls" {
  name                     = "salakehouse"
  resource_group_name      = azurerm_resource_group.lakehouse.name
  location                 = azurerm_resource_group.lakehouse.location
  account_tier             = "Standard"
  account_replication_type = "GRS"
  is_hns_enabled           = true

  enable_https_traffic_only = true
  min_tls_version                 = "TLS1_2"
  allow_nested_items_to_be_public = false

  identity {
    type = "SystemAssigned"
  }

  lifecycle {
    ignore_changes = [
      tags
    ]
  }
}

resource "azurerm_storage_container" "unity_catalog" {
  name                  = "unitycatalog"
  storage_account_name  = azurerm_storage_account.adls.name
  container_access_type = "private"

  depends_on = [
    azurerm_storage_account.adls
  ]
}

resource "azurerm_role_assignment" "sp_sa_adls" {
  scope                = azurerm_storage_account.adls.id
  role_definition_name = "Storage Blob Data Owner"
  principal_id         = data.azurerm_client_config.service_connection.object_id

  depends_on = [
    azurerm_storage_account.adls
  ]
}

resource "azurerm_role_assignment" "mi_unity_catalog" {
  scope                = azurerm_storage_account.adls.id
  role_definition_name = "Storage Blob Data Contributor"
  principal_id         = azurerm_databricks_access_connector.unity.identity[0].principal_id
}

resource "azurerm_databricks_access_connector" "unity" {
  name                = "db-mi-${local.prefix}"
  resource_group_name = azurerm_resource_group.lakehouse.name
  location            = azurerm_resource_group.lakehouse.location

  identity {
    type = "SystemAssigned"
  }
}

resource "databricks_metastore" "primary" {
  name = "primary"
  storage_root = format("abfss://%s@%s.dfs.core.windows.net/",
    azurerm_storage_container.unity_catalog.name,
  azurerm_storage_account.adls.name)
  force_destroy = true
}

resource "databricks_metastore_data_access" "primary" {
  metastore_id = databricks_metastore.primary.id
  name         = "mi_dac"

  azure_managed_identity {
    access_connector_id = azurerm_databricks_access_connector.unity.id
  }

  is_default = true
}

resource "databricks_metastore_assignment" "primary" {
  metastore_id         = databricks_metastore.primary.id
  workspace_id         = azurerm_databricks_workspace.lakehouse.workspace_id
  default_catalog_name = "hive_metastore"
}

variable "environment" {
  type        = string
  description = "Short name for deployment environemnt (e.g., dev, uat, prd)"
}

variable "resource_group_name" {
  type        = string
  description = "Name of the existing resource group where the Terraform state is stored"
}

variable "workload" {
  type        = string
  description = "Name for the workload specificed in the resource group (e.g., ingestion, ml, network)"
}

variable "region" {
  type        = string
  description = "Name of the region where the resources are targeted for deployment"
}

Expected Behavior

The databricks_metastore_data_access resource should be created without errors as the databricks_metastore was successfully created, requiring the same permissions.

Actual Behavior

Terraform plan fails with the error: "cannot create metastore data access: Only account admins can create Azure Managed Identity Storage Credentials."

Steps to Reproduce

  1. terraform apply -var-file=variables.tfvars

Terraform and provider versions

Terraform: 1.4.5 hashicorp/azurerm: 3.52.0 databricks/databricks: 1.14.3

Debug Output

╷ │ Error: cannot create metastore data access: Only account admins can create Azure Managed Identity Storage Credentials. │ │ with databricks_metastore_data_access.primary, │ on unity_catalog.tf line 19, in resource "databricks_metastore_data_access" "primary": │ 19: resource "databricks_metastore_data_access" "primary" { │ ╵

Important Factoids

TakeshiMatsukura commented 1 year ago

Can you share your config of the "databricks_metastore_data_access" related part?

adisaljusi commented 1 year ago

HI @TakeshiMatsukura thanks for the heads up, I forgot to include this in the issue. I've updated the issue with the required resources. I'll additionally add them in this comment as well.

resource "azurerm_databricks_access_connector" "unity" {
  name                = "db-mi-${local.prefix}"
  resource_group_name = azurerm_resource_group.lakehouse.name
  location            = azurerm_resource_group.lakehouse.location

  identity {
    type = "SystemAssigned"
  }
}

resource "databricks_metastore" "primary" {
  name = "primary"
  storage_root = format("abfss://%s@%s.dfs.core.windows.net/",
    azurerm_storage_container.unity_catalog.name,
  azurerm_storage_account.adls.name)
  force_destroy = true
}

resource "databricks_metastore_data_access" "primary" {
  metastore_id = databricks_metastore.primary.id
  name         = "mi_dac"

  azure_managed_identity {
    access_connector_id = azurerm_databricks_access_connector.unity.id
  }

  is_default = true
}

resource "databricks_metastore_assignment" "primary" {
  metastore_id         = databricks_metastore.primary.id
  workspace_id         = azurerm_databricks_workspace.lakehouse.workspace_id
  default_catalog_name = "hive_metastore"
}
nfx commented 1 year ago

most likely, you're missing the addition of SPN to account. you can do that by:

provider "databricks" {
  alias = "account"
  account_id = "..."
  host = "https://accounts.azuredatabricks.net"
}

resource "databricks_service_principal" "spn_running_apply" {
  provider       = databricks.account
  application_id = "00000000-0000-0000-0000-000000000000" // spn
}

resource "databricks_mws_permission_assignment" "add_admin_group" {
  workspace_id = azurerm_databricks_workspace.lakehouse.workspace_id
  principal_id = databricks_service_principal.spn_running_apply.id
  permissions  = ["ADMIN"]
}

@tanmay-db please verify if this works and confirm

tanmay-db commented 1 year ago

Hi @nfx, @adisaljusi databricks_service_principal resource cannot be used because this requires user to have account admin status and otherwise leads to:

│ Error: cannot create service principal: This API is disabled for users without account admin status. Contact your administrator for more information

Enabling account admin can be done in the accounts console. Please see: Assign account admin rights to a user. Also Manage users, service principals, and groups for information on what action can per performed by user.

After that databricks_metastore_data_access resource is created successfully. Verified with SPN not having account admin status (leads to - cannot create metastore data access: Only account admins can create Azure Managed Identity Storage Credentials) and with account admin status (leads to successful creation of the resource)

Closing the ticket. Please feel free to reopen.

adisaljusi commented 1 year ago

Hi @tanmay-db ,

Thanks for the explanation! That's correct, we forgot to assign the SPN with the account admin permissions.

Thanks for closing the issue!

Best, Adis