databricks / terraform-provider-databricks

Databricks Terraform Provider
https://registry.terraform.io/providers/databricks/databricks/latest
Other
445 stars 384 forks source link

[ISSUE] Issue with `databricks_mws_workspaces` resource when using oauth m2m and fetching workspace token #3757

Closed ebarault closed 2 months ago

ebarault commented 2 months ago

Configuration

# .databrickscfg
[accounts]
auth_type = oauth-m2m
host = https://accounts.cloud.databricks.com
account_id = *****
client_id = *****
client_secret = *****
terraform {
  required_providers {
    databricks = {
      source = "databricks/databricks"
      version = "1.48.2"
    }
  }
}

provider "databricks" {
  profile   = "accounts"
}

resource "databricks_mws_workspaces" "this" {
# ... as per the doc
}

Expected Behavior

Fallback to workspace level auth to retrieve the token generated by the ressource for the workspace

Actual Behavior

data.terraform_remote_state.credentials: Reading...
data.terraform_remote_state.storage: Reading...
data.terraform_remote_state.network[0]: Reading...
data.terraform_remote_state.credentials: Read complete after 1s
data.terraform_remote_state.network[0]: Read complete after 1s
data.terraform_remote_state.storage: Read complete after 1s
databricks_mws_workspaces.this: Refreshing state... [id=969f5729-42fc-45ee-b3a9-540716ff884f/2897870447515870]
module.id.data.aws_caller_identity.current: Reading...
module.id.data.aws_caller_identity.current: Read complete after 1s [id=774385249740]

Planning failed. Terraform encountered an error while generating this plan.

╷
│ Error: cannot read mws workspaces: cannot read token: failed during request visitor: inner token: Post "https://myworkspace.cloud.databricks.com/oidc/v1/token": {"error":"invalid_client","error_id":"f024cd9b-58b7-4c47-8eb4-78cf2e05e442","error_description":"Client authentication failed"}

Steps to Reproduce

Terraform and provider versions

terraform Terraform v1.8.5 on darwin_arm64

Is it a regression?

it used to work when using user/password auth

Debug Output

Important Factoids

The workspace was created with a previous version of the ressource, using the user/password auth at account level. See excerpt below.

```terraform
terraform {
  required_providers {
    databricks = {
      source = "databrickslabs/databricks"
      version = "0.4.4"
    }
  }
}

provider "databricks" {
  alias    = "mws
  host     = "https://accounts.cloud.databricks.com"
  username = var.databricks_account_username
  password = var.databricks_account_password
}


### Would you like to implement a fix?
<!-- If you plan to implement a fix for this, let the maintainers and community know -->
FreyGeospatial commented 2 months ago

Hey, I'm having a similar issue with Terraform. Maybe it's related.

When using provider

terraform {
  required_providers {
    databricks = {
      source  = "databricks/databricks"
      version = "~>1.6.5"
    }

    aws = {
      source  = "hashicorp/aws"
      version = "~>5.0"
    }
  }
}

provider "databricks" {
  alias      = "mws"
  host       = "https://accounts.cloud.databricks.com"
  account_id = var.databricks_account_id
  client_id = var.databricks_client_id
  client_secret = var.databricks_client_secret
}

I receive the error in my CI/CD Pipeline:

│ Error: cannot create mws credentials: failed visitor: invalid character '<' looking for beginning of value
│ 
│   with module.gts_dataplatform_dev.databricks_mws_credentials.this,
│   on ../modules/workspace/workspace.tf line 25, in resource "databricks_mws_credentials" "this":
│   25: resource "databricks_mws_credentials" "this" {
│ 

There are no < in any of my credentials that I'm passing. The same credentials work when using the workspace-level provider and passing in the workspace host.

When I switch to using username/password authentication for the account mws provider, authentication works and my resources are created.

patrickwilliamconway commented 2 months ago

hey @ebarault, thanks for sharing - I'm also working on migrating basic-auth to oauth-m2m using Service Principals. I have encountered the the same error after switching provider credentials. I believe this is due to resource ownership issues.

All my previous resources were owned by the username user from basic-auth creds. In my case, it was admin@actioniq.com. After creating the new Service Principal and swapping credentials, Databricks throws errors like Error: cannot read mws workspaces: cannot read token: failed during request visitor: inner token... and it seems to indicate that the Service Principal doesn't have the permissions to do any CRUD on that resource since it is still owned by admin@actioniq.com.

If I login as admin in the UI and assign CAN_MANAGE perms to the Service Provider, these seem to go away. I think that this can also do his via TF in a two step process:

  1. pull in a reference to your Service Principal using:
    // https://registry.terraform.io/providers/databricks/databricks/latest/docs/data-sources/service_principal
    data "databricks_service_principal" "tf_sp" {
    application_id = data.aws_ssm_parameter.dbr_query_service_oauth_client_id.value
    }
  2. add the Service Principal as owner or management group of each resource that hits this error
  3. migrate provider credentials to use Service Principal creds

This is what I've been trying to do, but there are so many layers of issues atop it: account vs workspace permissions, grants vs permissions vs acl ruleset, etc. that I haven't been able to complete it successfully.

Question for Databricks team: do you have any guides on how to migrate from basic-auth to oauth? I'm running into a lot of issues like this, and it's quite a PITA to sort it all out.

patrickwilliamconway commented 2 months ago

hey @FreyGeospatial, I've run into that type of error before (https://github.com/databricks/terraform-provider-databricks/issues/2513) and I think that usually means that you're using the incorrect level of API configuration (account vs workspace level). IIRC, the < comes from an underlying HTTP library that the go-sdk uses - it isn't actually related to your payloads at all.

patrickwilliamconway commented 2 months ago

To add more here, I don't get consistent behavior. This errors:

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.main.module.query_service_workspace_level[0].databricks_permissions.endpoint_usage: Modifying... [id=/sql/warehouses/54e59454c9664997]
╷
│ Error: cannot update permissions: failed during request visitor: inner token: Post "https://<myworkspacename>.cloud.databricks.com/oidc/v1/token": {"error":"invalid_client","error_id":"faeb24cc-6fdf-4582-9446-cc5ca979d308","error_description":"Client authentication failed"}
│
│   with module.main.module.query_service_workspace_level[0].databricks_permissions.endpoint_usage,
│   on ../../../../modules/databricks/query_service_workspace/workspace_level/main.tf line 143, in resource "databricks_permissions" "endpoint_usage":
│  143: resource "databricks_permissions" "endpoint_usage" {
│
╵

and then I immediately rerun and succeed:

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

module.main.module.query_service_workspace_level[0].databricks_permissions.endpoint_usage: Modifying... [id=/sql/warehouses/54e59454c9664997]
module.main.module.query_service_workspace_level[0].databricks_permissions.endpoint_usage: Modifications complete after 2s [id=/sql/warehouses/54e59454c9664997]

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

and I'm using oauth configuration

provider "databricks" {
  host          = local.qs_url
  account_id    = local.qs_databricks_account_id
  client_id     = local.qs_databricks_oauth_client_id
  client_secret = local.qs_databricks_oauth_client_secret
}
ebarault commented 2 months ago

hmmm, I see @patrickwilliamconway, thanks for sharing this. I was actually applying the databricks_mws_workspaces module with a service-principal that has permissions only at account level, none at workspace level.

Using a user with the right permissions at the workspace level did it.

I'm going to close this issue