databricks / terraform-provider-databricks

Databricks Terraform Provider
https://registry.terraform.io/providers/databricks/databricks/latest
Other
451 stars 389 forks source link

[ISSUE] Issue after updating TF Provider from 1.49.1 to 1.50: "ERROR: Tenant shouldn't be specified for managed identity account" #3918

Open HansjoergW opened 2 months ago

HansjoergW commented 2 months ago

Hi

After updating from Databricks TF Provider from 1.49.1 to 1.50 we receive the error "ERROR: Tenant shouldn't be specified for managed identity account".

The configuration of the provider didn't change.

We have a TF module that creates a Workspace. After that, the provider is initialized with the URL of the created workspace.

module "dbx" {
    ...
}

provider "databricks" {
  alias = "main"
  host  = module.dbx.ws_url
}

After that, we pass that provider to another module which then takes care of the "detail configuration" of the workspace.

#################################################################
# finally test this module
#################################################################
module "dbxconfig" {
  providers = {
    databricks = databricks.main
    azurerm    = azurerm
  }
  source                       = "../module"
  depends_on                   = [module.dbx.ws_url]
  whitelist_azure_service_tags = ["PowerBI"]
  ip_restrictions              = ["1.1.1.1"]
}

After updating to 1.50, we received the following error, after "apply" (plan did work):

│ Error: cannot create workspace conf: failed during request visitor: default auth: azure-cli: cannot get access token: ERROR: Tenant shouldn't be specified for managed identity account
│ . Config: host=https://adb-2665674642156245.5.azuredatabricks.net/, azure_use_msi=true, azure_tenant_id=af7227b1-ac3a-4487-9e9f-ba462bb409d4. Env: ARM_USE_MSI, ARM_TENANT_ID
│ 
│   with module.dbxconfig.databricks_workspace_conf.workspace_conf,
│   on ../module/main.tf line 4, in resource "databricks_workspace_conf" "workspace_conf":
│    4: resource "databricks_workspace_conf" "workspace_conf" {

There were NO other changes other than updating the Terraform Provider. (We have an automated renovate process that ensures this was the only change in the merge request.)

NOTE: This happens on the automated build system, which of course has several environment variables concerning Azure set, like ARM_USE_MSI, ARM_TENANT_ID.

Expected Behavior

It should work as it did with 1.49.1

Actual Behavior

Provider initialization fails.

│ Error: cannot create workspace conf: failed during request visitor: default auth: azure-cli: cannot get access token: ERROR: Tenant shouldn't be specified for managed identity account
│ . Config: host=https://adb-2665674642156245.5.azuredatabricks.net/, azure_use_msi=true, azure_tenant_id=af7227b1-ac3a-4487-9e9f-ba462bb409d4. Env: ARM_USE_MSI, ARM_TENANT_ID
│ 
│   with module.dbxconfig.databricks_workspace_conf.workspace_conf,
│   on ../module/main.tf line 4, in resource "databricks_workspace_conf" "workspace_conf":
│    4: resource "databricks_workspace_conf" "workspace_conf" {

Steps to Reproduce

Change Provider version from 1.49.1 to 1.50

Terraform and provider versions

Databricks Terraform Provider version 1.50

Is it a regression?

Other merge requests/branches, that use 1.49.1 are still working.

alexott commented 2 months ago

Most probably it's because of the Go SDK upgrade that included this: https://github.com/databricks/databricks-sdk-go/pull/910

mgyucht commented 2 months ago

That change interestingly enough doesn't touch the Azure MSI authentication in the Go SDK, only the Azure CLI authentication. What I suspect is happening is that the provider is actually authenticating via the Azure CLI, which is itself authenticated via MSI. Now, the SDK is specifying the tenant ID when trying to invoke the CLI, but when the CLI is authenticated with managed identity, that is not allowed.

Additionally, I think it is a bug that the Go SDK is trying to use CLI auth in this case in the first place.

I think we need to make two changes:

  1. We don't need to assert that AZURE_WORKSPACE_RESOURCE_ID is set when doing Azure MSI auth as long as the hostname is present. That is only used to resolve the workspace host, which will use the token retrieved during the auth process.
  2. We can change the CLI auth not to send the tenant ID to the CLI when ARM_USE_MSI is true. If we do the first fix, this would likely be a dead codepath, but it would ultimately be defensive (e.g. in case a user didn't specify ARM_USE_MSI but the CLI is authenticated with MSI anyways).