databricks / terraform-provider-databricks

Databricks Terraform Provider
https://registry.terraform.io/providers/databricks/databricks/latest
Other
456 stars 393 forks source link

[ISSUE] Issue with `databricks_metastore_data_access` resource when using a provider which workspace url is not assigned to the metastore (did work until 27.08.2023) #2624

Open HJatMobi opened 1 year ago

HJatMobi commented 1 year ago

Configuration

We have staged pipeline that creates a Metastore together with a data_access_storage_credential first in a preprod environment and then in a prod.

Therefore, the same TF code is used as shown below (the var.stage defines whether this is preprod or prod). We use a "plain" databricks workspace in order to be able to create a provider with which the metastore can be created. However, this "plain" workspace will not be assigned to the metastore.

Moreover, the "preprod" metastore will not have assigned any workspaces.


# we create a simple databricks ws in order to initialize the provider with it
resource "azurerm_databricks_workspace" "dswvanilla" {
  name                = "${module.yaml.app_id}-${module.yaml.fk_tk_name}-dswvanilla-${var.stage}-dsw"
  resource_group_name = azurerm_resource_group.msrg.name
  location            = azurerm_resource_group.msrg.location
  sku                 = "premium"
}

# this provider will be used by the module metastore
provider "databricks" {
  host = azurerm_databricks_workspace.dswvanilla.workspace_url
}

# create the metastore
resource "databricks_metastore" "metastore" {
  name          = format("%s_ms_%s", local.ms_name, var.stage)
  storage_root  = format("abfss://%s@%s.dfs.core.windows.net/", azurerm_storage_container.msdefaultcontainer.name, azurerm_storage_account.sa.name)
  owner         = format("Metastore-%s-%s-Owner", local.ms_name, title(var.stage))
  force_destroy = var.stage == "dev" || var.stage == "test" ? true : false
}

resource "databricks_metastore_data_access" "data_access_storage_credential" {
  depends_on   = [databricks_metastore.metastore]
  metastore_id = databricks_metastore.metastore.id
  name         = format("%s_default_%s_sc", local.ms_name, var.stage)
  azure_managed_identity {
    access_connector_id = azapi_resource.access_connector[var.stage].id
  }
  is_default = true
}

Expected Behavior

For months now, this code runs every time when there are version updates within the pipeline (e.g. TF provider) and normally, nothing is has to be done, since the resources do already exists for months.

And it did run without any problems until last Sunday, when the databricks terraform provider was updated from version 1.23 to version 1.24. (switching back to the previous version did not solve our problem).

Actual Behavior

Terraform plan works without any problems. But since last Sunday, or early Monday morning, our pipeline produces the following error message when terrafrom apply is executed:

│ Error: cannot create metastore data access: No metastore assigned for the current workspace. │ │ with databricks_metastore_data_access.data_access_storage_credential, │ on main.tf line 248, in resource "databricks_metastore_data_access" "data_access_storage_credential": │ 248: resource "databricks_metastore_data_access" "data_access_storage_credential" { │

As stated above, the workspace which url was used to initialize the provider, is not attached to the metastore. And again, that was not a problem till two days ago.

Steps to Reproduce

  1. terraform apply

Terraform and provider versions

1.24

Debug Output

TF_LOG=DEBUG produces to much output which is cut by the our CICD pipeline.

nkvuong commented 1 year ago

@HJatMobi with 1.24, you can now create a databricks_metastore & databricks_metastore_data_access directly using an account url, instead of a workspace url.

The behaviour of creating databricks_metastore via a workspace without assigning it has some weird edge case, so we would recommend switching over to account-level provider

HJatMobi commented 1 year ago

Thanks @nkvuong

@HJatMobi with 1.24, you can now create a databricks_metastore & databricks_metastore_data_access directly using an account url, instead of a workspace url.

The behaviour of creating databricks_metastore via a workspace without assigning it has some weird edge case, so we would recommend switching over to account-level provider

Ok, i changed the provider to a provider with an account configuration.

now it is complaining that the resource already exists:

╷ │ Error: cannot create metastore data access: Storage Credential 'mobi_default_preprod_sc' already exists │ │ with databricks_metastore_data_access.data_access_storage_credential, │ on main.tf line 256, in resource "databricks_metastore_data_access" "data_access_storage_credential": │ 256: resource "databricks_metastore_data_access" "data_access_storage_credential" { │ ╵

Then I wanted to make a "terraform import" of the resource into the state, but the documentation states:

[Import](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/metastore_data_access#import)
Note
Importing this resource is not currently supported.

What shall i do know?

nkvuong commented 1 year ago

I'll need to update the doc regarding Importing this resource is not currently supported.

can you try importing using <metastore_id>|<dac_name>?

HJatMobi commented 1 year ago

Hi @nkvuong I had no time yesterday to try it. Today i tried the following.

terraform import databricks_metastore_data_access.data_access_storage_credential "6d8fc2cc-96a7-4007-ae6b-48fe2
6da3d0a/mobi_default_preprod_sc"

which produced the following error message:

│ Error: The provider returned a resource missing an identifier during ImportResourceState. This is generally a bug in the resource implementation for import. Resource import code should not call d.SetId("") or create an empty ResourceData. If the resource is missing, instead return an error. Please report this to the provider developers.
│

For completion, the error message that is shown during tf deploy is

Error: cannot create metastore data access: Storage Credential 'mobi_default_preprod_sc' already exists
│ 
│   with databricks_metastore_data_access.data_access_storage_credential,
│   on main.tf line 256, in resource "databricks_metastore_data_access" "data_access_storage_credential":
│  256: resource "databricks_metastore_data_access" "data_access_storage_credential" {
HJatMobi commented 1 year ago

BTW, i tried the id with a slash "6d8fc2cc-96a7-4007-ae6b-48fe26da3d0a/mobi_default_preprod_sc" or a pipe.

With a pipe "6d8fc2cc-96a7-4007-ae6b-48fe26da3d0a|mobi_default_preprod_sc" i got the output:

sh: mobi_default_preprod_sc: not found

Moreover, this caused the tfstate file to be locked.

HJatMobi commented 1 year ago

The solution that finally worked for me was just to change the name of the Storage Credential. Since the configuration of it uses the same access-connector as the old one, it really doesn't matter.

I can then remove the old one with the databricks cli, and after that, if necessary, I could rename the newname to the old one. That at least is a working "work around".

Therefore, you can close the issue.

mfergus1 commented 1 year ago

I am also seeing this issue with the storage credential that seems to get made by the databricks_metastore_data_access resource and it's preventing automatic workspace generation.

The issue extends, further, into metastore assignment:

Having switched to use the account level provider and created an entirely new metastore with my existing metastore terraform module (again, not configured to use workspace API), when assigning the metastore to the workspace I get a permadiff that does not accurately represent the state of the assignment :

  # databricks_metastore_assignment.main["REDACTED"] will be updated in-place
  ~ resource "databricks_metastore_assignment" "main" {
        id                   = "REDACTED"
      ~ metastore_id         = "ANOTHER_ID -> "THE_APPROPRIATE_METASTORE_ID_THAT_IS_ALREADY_ASSIGNED"
        # (2 unchanged attributes hidden)
    }

For this change, I then get the error, where REDACTED_WORKSPACE_ID is a workspace ID from the first workspace ever created in the account. That workspace is not using this metastore nor am I leveraging API tokens associated with that workspace to run this terraform.

│ Error: cannot update metastore assignment: Can only update metastore assignment for current workspace 'REDACTED_WORKSPACE_ID'
│ 
│   with databricks_metastore_assignment.main["7952916957758936"],
│   on main.tf line 36, in resource "databricks_metastore_assignment" "main":
│   36: resource "databricks_metastore_assignment" "main" {

Doing a fresh apply of a whole new metastore still, I first got

│ Error: cannot create metastore data access: Storage Credential '####fa4-##-##' does not exist.

The following run, with the same configuration, I got:

│ Error: cannot create metastore data access: Storage Credential 'eu-west-1-####' already exists
│ 
│   with databricks_metastore_data_access.main,
│   on main.tf line 27, in resource "databricks_metastore_data_access" "main":
│   27: resource "databricks_metastore_data_access" "main" {
│ 
╵

Note that the storage credential is now a human-friendly string not the ID as it was first outputted.

In the plan, this looks like it should work fine:

  + resource "databricks_metastore_data_access" "main" {
      + configuration_type = (known after apply)
      + id                 = (known after apply)
      + is_default         = true
      + metastore_id       = "REDACTED_NEW_METASTORE_ID_AS_DESIRED"
      + name               = "REDACTED_NEW_METASTORE_NAME_AS_DESIRED"

      + aws_iam_role {
          + role_arn = "arn:aws:iam::135624721198:role/eu-west-1-databricks-metastore-access"
        }
    }

When I land in the GUI, however, I do find a storage token by the name I want but it is in the wrong metastore and I have no idea how that would happen.