databricks / terraform-provider-databricks

Databricks Terraform Provider
https://registry.terraform.io/providers/databricks/databricks/latest
Other
445 stars 384 forks source link

[ISSUE] Issue with `databricks_secret_acl` resource #2423

Open liahagan opened 1 year ago

liahagan commented 1 year ago

Configuration

terraform {
  backend "local" {}
  required_providers {
    databricks = {
      source  = "databricks/databricks"
      version = "1.19.0"
    }
  }
}

provider "databricks" {
  host                        = "adb-0000000000000000.0.azuredatabricks.net"
  azure_workspace_resource_id = "<azure_workspace_resource_id>"
}

provider "databricks" {
  alias      = "account"
  host       = "https://accounts.azuredatabricks.net/"
  account_id = "00000000-0000-0000-0000-000000000000"
}

resource "databricks_group" "team" {
  provider = databricks.account
  for_each = toset([
    "[UC] Team A - Data Engineers",
    "[UC] Team A - Data Scientists",
    "[UC] Team A - Data Analysts",
    "[UC] Team A - Data Owners"
  ])

  display_name = each.key
}

resource "databricks_mws_permission_assignment" "team" {
  provider = databricks.account

  for_each = databricks_group.team

  workspace_id = "0000000000000000"
  principal_id = each.value.id
  permissions  = ["USER"]
}

resource "databricks_entitlements" "team" {
  for_each = databricks_group.team

  group_id                   = each.value.id
  allow_cluster_create       = false
  allow_instance_pool_create = false
  databricks_sql_access      = true
  workspace_access           = true

  depends_on = [databricks_mws_permission_assignment.team]
}

resource "databricks_secret_scope" "example" {
  name = "example_scope"
}

resource "databricks_secret_acl" "example_read" {
  for_each = databricks_group.team

  principal  = each.value.display_name
  permission = "READ"
  scope      = databricks_secret_scope.example.name

  depends_on = [databricks_entitlements.team]
}

Expected Behavior

terraform apply succeeds. Groups, workspace assignments and entitlements, secret scope and secret scope ACLs successfully created.

Actual Behavior

Secret scope ACL creation fails very often, but not always with this error. No patterns for failure have been discovered.

╷
│ Error: cannot read secret acl: Failed to get secret acl for principal [UC] Team A - Data Scientists for scope example_scope.
│ 
│   with databricks_secret_acl.example_read["[UC] Team A - Data Scientists"],
│   on main.tf line 60, in resource "databricks_secret_acl" "example_read":
│   60: resource "databricks_secret_acl" "example_read" {
│ 
╵

Steps to Reproduce

Terraform v1.4.6 on darwin_amd64

Debug Output

This is the debug output for databricks_secret_acl resources that are created correctly:

2023-06-23T10:34:09.518+0200 [DEBUG] provider.terraform-provider-databricks_v1.19.0: POST /api/2.0/secrets/acls/put
> {
>   "permission": "READ",
>   "principal": "[UC] Team A - Data Engineers",
>   "scope": "example_scope"
> }
< HTTP/2.0 200 OK
< {}: timestamp=2023-06-23T10:34:09.518+0200
2023-06-23T10:34:09.696+0200 [DEBUG] provider.terraform-provider-databricks_v1.19.0: GET /api/2.0/secrets/acls/get?principal=[UC] Team A - Data Engineers&scope=example_scope
< HTTP/2.0 200 OK
< {
<   "permission": "READ",
<   "principal": "[UC] Team A - Data Engineers"
< }: timestamp=2023-06-23T10:34:09.696+0200

This is the debug output when creation fails (notice that the POST operation returns a success):

2023-06-23T10:34:09.328+0200 [DEBUG] provider.terraform-provider-databricks_v1.19.0: POST /api/2.0/secrets/acls/put
> {
>   "permission": "READ",
>   "principal": "[UC] Team A - Data Owners",
>   "scope": "example_scope"
> }
< HTTP/2.0 200 OK
< {}: timestamp=2023-06-23T10:34:09.328+0200
2023-06-23T10:34:09.572+0200 [DEBUG] provider.terraform-provider-databricks_v1.19.0: GET /api/2.0/secrets/acls/get?principal=[UC] Team A - Data Owners&scope=example_scope
< HTTP/2.0 404 Not Found
< {
<   "error_code": "RESOURCE_DOES_NOT_EXIST",
<   "message": "Failed to get secret acl for principal [UC] Team A - Data Owners for scope example_scope."
< }: timestamp=2023-06-23T10:34:09.572+0200
2023-06-23T10:34:09.572+0200 [ERROR] provider.terraform-provider-databricks_v1.19.0: Response contains error diagnostic: tf_req_id=e3905af9-19d0-89ca-8e21-9fd3da82e0aa tf_resource_type=databricks_secret_acl tf_rpc=ApplyResourceChange @module=sdk.proto diagnostic_summary="cannot read secret acl: Failed to get secret acl for principal [UC] Team A - Data Owners for scope example_scope." tf_proto_version=5.3 tf_provider_addr=registry.terraform.io/databricks/databricks diagnostic_detail= diagnostic_severity=ERROR @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/vendor/github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/diag/diagnostics.go:55 timestamp=2023-06-23T10:34:09.572+0200
2023-06-23T10:34:09.593+0200 [ERROR] vertex "databricks_secret_acl.example_read[\"[UC] Team A - Data Owners\"]" error: cannot read secret acl: Failed to get secret acl for principal [UC] Team A - Data Owners for scope example_scope.

Important Factoids

We have another databricks_secret_acl resource in our code that assigns READ access for a service principal to the same secret scope. This has never failed. The only difference is the principal type (service principal vs. account-level group) and that the resource in this issue is looping with for_each while the other does not have a loop.

mansenfranzen commented 1 month ago

I'm having the same issue pattern currently with v1.49.1. Sometimes it seems as if the groups/user backend service is not responding correctly. Strangely enough, the same group display names work with other resources such as permissions or assignments but fail with secret acls. Does not make any sense and is fairly annoying.

alexott commented 1 month ago

@mansenfranzen can you share relevant debug logs?

mansenfranzen commented 1 month ago

Sorry, I didn't record the debug log. The error just disappeared again after some time without any intervention. Next time it happens, I will catch the log and share it here.

seblatre commented 3 weeks ago

Hello! I just run into the same issue with Terraform 1.9.5 and databricks 1.45.0.

I have the following simple configuration:

variable "project_name" {
  type        = string
}

variable "keyvault_team" {
  type = object({
    id        = string
    name      = string
    vault_uri = string
  })
}

resource "databricks_group" "project" {
  display_name               = upper(var.project_name)
  allow_cluster_create       = false
  allow_instance_pool_create = false
  databricks_sql_access      = false
  workspace_access           = false
}

resource "databricks_secret_scope" "team" {
  name = "Scope_${upper(var.project_name)}_team"

  keyvault_metadata {
    resource_id = var.keyvault_team.id
    dns_name    = var.keyvault_team.vault_uri
  }
}

resource "databricks_secret_acl" "team_acl" {
  principal  = databricks_group.project.display_name
  permission = "READ"
  scope      = databricks_secret_scope.team.name
}

This is randomly happening with normal terraform apply or through terraform test like here: image

I didn't capture logs but I did log HTTP call through proxy (not sure if it helps): image image

seblatre commented 1 week ago

Hello @alexott, I've been able to generate logs from this issue but there is no more to see than the HTTP sniffing...

2024-09-06T11:29:02.940+0200 [DEBUG] provider.terraform-provider-databricks_v1.45.0.exe: POST /api/2.0/secrets/acls/put
> {
>   "permission": "READ",
>   "principal": "A97",
>   "scope": "Scope_A97_team"
> }
< HTTP/1.1 200 OK
< {}: tf_provider_addr=registry.terraform.io/databricks/databricks tf_rpc=ApplyResourceChange @module=databricks tf_req_id=f9fb5b01-82cf-82d9-31dd-6e86029902d2 tf_resource_type=databricks_secret_acl @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/logger/logger.go:33 timestamp="2024-09-06T11:29:02.940+0200"
2024-09-06T11:29:03.013+0200 [DEBUG] provider.terraform-provider-databricks_v1.45.0.exe: PATCH /api/2.0/preview/scim/v2/Groups/523061308792933
[...]
2024-09-06T11:29:03.027+0200 [DEBUG] provider.terraform-provider-databricks_v1.45.0.exe: PATCH /api/2.0/preview/scim/v2/Groups/489949216506626
[...]
2024-09-06T11:29:03.177+0200 [DEBUG] provider.terraform-provider-databricks_v1.45.0.exe: GET /api/2.0/secrets/acls/get?principal=A97&scope=Scope_A97_team
< HTTP/1.1 404 Not Found
< {
<   "error_code": "RESOURCE_DOES_NOT_EXIST",
<   "message": "Failed to get secret acl for principal A97 for scope Scope_A97_team."
< }: tf_provider_addr=registry.terraform.io/databricks/databricks tf_req_id=f9fb5b01-82cf-82d9-31dd-6e86029902d2 tf_rpc=ApplyResourceChange @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/logger/logger.go:33 @module=databricks tf_resource_type=databricks_secret_acl timestamp="2024-09-06T11:29:03.177+0200"
2024-09-06T11:29:03.178+0200 [DEBUG] provider.terraform-provider-databricks_v1.45.0.exe: non-retriable error: Failed to get secret acl for principal A97 for scope Scope_A97_team.: @module=databricks tf_provider_addr=registry.terraform.io/databricks/databricks tf_req_id=f9fb5b01-82cf-82d9-31dd-6e86029902d2 @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/logger/logger.go:33 tf_resource_type=databricks_secret_acl tf_rpc=ApplyResourceChange timestamp="2024-09-06T11:29:03.177+0200"
2024-09-06T11:29:03.178+0200 [ERROR] provider.terraform-provider-databricks_v1.45.0.exe: Response contains error diagnostic: diagnostic_detail="" tf_resource_type=databricks_secret_acl tf_provider_addr=registry.terraform.io/databricks/databricks @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/vendor/github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/diag/diagnostics.go:58 @module=sdk.proto diagnostic_severity=ERROR diagnostic_summary="cannot read secret acl: Failed to get secret acl for principal A97 for scope Scope_A97_team." tf_proto_version=5.6 tf_req_id=f9fb5b01-82cf-82d9-31dd-6e86029902d2 tf_rpc=ApplyResourceChange timestamp="2024-09-06T11:29:03.178+0200"
2024-09-06T11:29:03.178+0200 [ERROR] vertex "databricks_secret_acl.team_acl" error: cannot read secret acl: Failed to get secret acl for principal A97 for scope Scope_A97_team.