[ISSUE] Issue with `databricks_mount` resource when changing admin clusters

PeterDowdy commented 1 year ago

Configuration

resource "databricks_cluster" "admin" {
  cluster_name            = "staging - Admin"
  spark_version           = data.databricks_spark_version.latest_lts.id
  node_type_id            = data.databricks_node_type.smallest.id
  autotermination_minutes = 30
  num_workers             = 0
  spark_conf = {
    # Single-node
    "spark.databricks.cluster.profile" : "singleNode"
    "spark.master" : "local[*]"
  }
  custom_tags = {
    "ResourceClass" = "SingleNode"
  }
  spark_env_vars = {
    GITLAB_REGISTRY_TOKEN = "{{secrets/gitlab-package-registry/gitlab-registry-token}}"
  }
  aws_attributes {
    instance_profile_arn = module.databricks_instance_admin_profile.id
  }

  depends_on = [
    module.databricks_instance_admin_profile
  ]
}
### S3 buckets

module "output_bucket" {
  source                  = "./modules/bio_team_bucket"
  bucket_name             = "a_bucket_name"
  deployment_stage  = var.deployment_stage
  data_encryption_key_arn = "arn:aws:kms:a_key_goes_here"
}

# ## DBFS mounts
resource "databricks_mount" "output_bucket_mount" {
  name            = module.output_bucket.bucket_id
  encryption_type = "sse-kms"

  cluster_id = databricks_cluster.admin.cluster_id
  s3 {
    bucket_name = module.output_bucket.bucket_id
  }
}

Expected Behavior

The mount should re-create with the new cluster

Actual Behavior

The mount fails to create with this error message: instance profile is required to re-create mounting cluster. When I inspect the terraform state for the mount, it still has the ID of the old admin cluster. Since that cluster is gone and that ID is irretrievable, the mount keeps failing when it 400s while requesting the cluster by ID. It then ignores the cluster_id specified in the terraform and raises an error message.

Steps to Reproduce

Create a cluster in terraform
Create an S3 mount using that cluster
Delete that cluster
Create a new admin cluster and set the s3 mount to use that cluster

Terraform and provider versions

Terraform v1.3.6 on darwin_arm64

provider registry.terraform.io/databricks/databricks v1.7.0
provider registry.terraform.io/hashicorp/aws v4.15.1
provider registry.terraform.io/hashicorp/time v0.9.1

Debug Output

https://gist.github.com/PeterDowdy/aa7d860a64c593c1d1f3304c23e0109c

Important Factoids

The old cluster was created by a PAT, and the new cluster was created by a service user.

ouranos commented 1 year ago

Having the same issue here, the cluster in our testing environment got deleted (probably a user mistake). terraform plan now fails with the same error mentioned above.

I had to delete the mounts from the state so that the cluster (and the mounts) could be re-created.

PeterDowdy commented 1 year ago

You can also pull the state, change the cluster ID in the mount resource, and then push it back.

redcape commented 1 year ago

The same issue happens when the mounting cluster is auto-deleted after 30 days from non-use as described in the docs.

ouranos commented 1 year ago

The same issue happens when the mounting cluster is auto-deleted after 30 days from non-use as described in the docs.

Yes, it happened again and I realised that's what actually caused the issue for us in the first place.

I fixed the state and pinned that cluster to avoid further issues.

databricks / terraform-provider-databricks