databricks / terraform-provider-databricks

Databricks Terraform Provider
https://registry.terraform.io/providers/databricks/databricks/latest
Other
440 stars 371 forks source link

[FEATURE] Manage workspace-level resources using an account-level provider #2924

Open nitoxys opened 9 months ago

nitoxys commented 9 months ago

Use-cases

All Databricks configuration still relies on workspace (randomly generated) URLs which depend on the provider being configured manually. This would allow our modules to be able to configure workspaces without the need to manually specific an aliased provider.

Attempted Solutions

Terraform does not support modules with a provider specified in a module or sub-module. It also does not support dynamic providers.

For example: get data about the workspace -> pass it to the provider in a loop. It doesn't work.

Proposal

Set Databricks provider to accounts (based on Cloud, like for Unity Catalog). Pass API requests through that to the workspace URL's specified in the resource configuration.

resource "databricks_grants" "unity_catalog_ms_location" {
  for_each  = { for unity_catalog in local.all_unityCatalogs : unity_catalog.name => unity_catalog }
  metastore = databricks_metastore.unity_catalog_ms[each.value.name].id
  grant {
    principal  = var.arm_client_id
    privileges = [
      "CREATE_CATALOG",
      "CREATE_EXTERNAL_LOCATION",
      "CREATE_CONNECTION",
      "CREATE_PROVIDER",
      "CREATE_RECIPIENT",
      "CREATE_SHARE",
      "MANAGE_ALLOWLIST",
      "SET_SHARE_PERMISSION"
    ]
  }
  workspace_id = data.azurerm_databricks_workspace.unity_catalog_dbrws[each.value.workspace].workspace_id
  provider     = databricks.management
  depends_on   = [databricks_metastore_data_access.unity_catalog_ms]
}

References

https://stackoverflow.com/questions/74706267/how-to-configure-terraform-databricks-provider-when-deploying-multiple-databrick

nitoxys commented 9 months ago

https://registry.terraform.io/providers/TelkomIndonesia/linux/latest/docs#provider-override

Or allow provider_override.

mgyucht commented 9 months ago

@nitoxys thank you for this suggestion. We'll discuss with the team. This is somewhat tricky in the current implementation as the underlying client caches the host & authentication used, so actually doing this properly might require relatively major refactors to the architecture of the provider and the underlying client.

nitoxys commented 9 months ago

@nitoxys thank you for this suggestion. We'll discuss with the team. This is somewhat tricky in the current implementation as the underlying client caches the host & authentication used, so actually doing this properly might require relatively major refactors to the architecture of the provider and the underlying client.

I work for a large enterprise that requires this feature so we can programmatically manage our workspaces via code. Terraform does not allow dynamic providers. This feature has been on their roadmap for 3 years because of how they load the providers on init. We deploy our workspaces as code through lists and loop it. We cannot apply the same logic to configuration of said workspaces. It does work with unity catalog but that uses accounts. I would have to somehow generate the list of workspaces via cli and do some jenky stuff to hack it to get it to where it should be. The overall management of workspaces should be through accounts no matter the cloud platform.

mgyucht commented 9 months ago

@nitoxys thanks for the context. Is the only/main problem you're facing with databricks_grants, or is it an issue with other resources as well? Context is we are looking at exposing some more (but not all) UC APIs at the account level, which would allow you to use your account-level provider for databricks_grants.

zshao9 commented 8 months ago

👍 👍 👍

This is going to be a day-vs-night change for our Terraform setup, where today we cannot put them into a for_each loop, and we need to create a complex dependency between the resource databrick_workspace and the provider of the workspace-level, while with this feature, everything will be a lot simpler in a flat, single provider space.

In short, this change will encourage all Databricks enterprise customers to scale the number of workspaces and the usage of Databricks in general.