Closed aihw-jimsolomos closed 1 year ago
Please also state:
MSI type: user assigned - the devops agent service connection user that has the permissions within the subscription. The SP is also assigned "Account administrator" within Databricks account
Environment variables
DATABRICKS_AZ_ACCOUNT_URL = "https://accounts.azuredatabricks.net"
DATABRICKS_AZ_ACCOUNT_ID = "I can email this if required"
METASTORE_NAME = "primary"
METASTORE_STORAGE_ROOT = "abfss://unitycatalog@<can email if required>.dfs.core.windows.net/"
METASTORE_OWNER = "metastoreadmins"
WORKSPACENAME = "dev-disability-databricksResearchEnv"
WORKSPACERESOURCEGROUPNAME = "dev-disability-databricks-rg"
STORAGE_ACCESS_CONNECTOR_ID = "/subscriptions/<email me if required>/resourceGroups/dev-storage-rg/providers/Microsoft.Databricks/accessConnectors/dev-unitycatalogAccessConnector"
Type of compute is ubuntu 20.04 -- current version here https://github.com/actions/runner-images/blob/ubuntu22/20230219.1/images/linux/Ubuntu2204-Readme.md
@aihw-jimsolomos , good. Thanks for the detail! I'll take a look
btw, you can hard-code "https://accounts.azuredatabricks.net" as the host for account-level provider and use DATABRICKS_ACCOUNT_ID
environment variable for it to be picked up automatically.
I was able to locally reproduce this issue on Windows (as my employer doesn't provide Linux machines)
What you can do is when specifying the databricks provider you can pass in the user provisioned client details like so, this far as I can tell this will run the provider using the managed system identity (user assigned.)
provider "databricks" {
host = data.azurerm_databricks_workspace.default.workspace_url
azure_workspace_resource_id = data.azurerm_databricks_workspace.default.id
# ARM_USE_MSI environment variable is recommended
azure_use_msi = true
azure_client_id = "<SUPER_SECRET_SECRET>"
azure_client_secret = "<SUPER_SECRET_SECRET>"
azure_tenant_id = "<SUPER_SECRET_SECRET>"
}
provider "databricks" {
alias = "mws"
host = "https://accounts.azuredatabricks.net"
account_id = var.DATABRICKS_AZ_ACCOUNT_ID
# ARM_USE_MSI environment variable is recommended
azure_use_msi = true
azure_client_id = "<SUPER_SECRET_SECRET>"
azure_client_secret = "<SUPER_SECRET_SECRET>"
azure_tenant_id = "<SUPER_SECRET_SECRET>"
}
Should be able to debug without deploying a whole pipeline system. My skills in troubleshooting are fairly weak as I am very new to terraform, if I get some time next week I will try to teach myself more.
Hi @nfx I think I have found the issue with the move to Go SDK
I used Fiddler examine the different between 1.9.2 and 1.10 (1.15.1 in this case)
The error of "identity not found is coming" from the "/metadata/identity/oauth2/token" service that is hosted on the virtual machine.
What appears to have happened is that for 1.9.2 the authentication provider was the old ADAL endpoint of
https://login.microsoftonline.com/<tenant>/oauth2/token HTTP/1.1
That call was sending something like
POST https://login.microsoftonline.com/c2d40835-0130-4bcf-8be3-7ba19466d3b3/oauth2/token HTTP/1.1
Host: login.microsoftonline.com
User-Agent: Go/go1.18.10 (amd64-windows) go-autorest/adal/v1.0.0
Content-Length: 177
Content-Type: application/x-www-form-urlencoded
Cookie: fpc=<deleted>; x-ms-gateway-slice=estsfd; stsservicecookie=estsfd
Accept-Encoding: gzip
client_id=<SUPERSECRET>&client_secret=<SUPERSECRET>&grant_type=client_credentials&resource=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d <-------- Databricks resource ID
From this we can see the process has been passed the client secret
However
For the upgraded GO SDK we can see that the process is using a different set of APIs that query the local "/metadata/identity/oauth2/token" service API rather than login.microsoft.
This API uses the local metadata service that is related to the VM directly rather than extracting something that is called in the command line/from the process.
Now this isn't actually a problem IF in my case I was using a VM that had the system managed identity was actually assigned to the machine but what I am doing is running my pipelines with Microsoft managed Azure DevOps agents. These agents may have a service connection, but they don't get the Managed service identity.
It actually turns out that you are required to use a self-hosted agent on an Azure VM in order to use managed service identity. https://learn.microsoft.com/en-us/azure/devops/pipelines/library/connect-to-azure?view=azure-devops#create-an-azure-resource-manager-service-connection-to-a-vm-with-a-managed-service-identity
So long story short it was just luck that it worked previously because of how the old-style login was being used.
To test this theory, I will setup a managed VM. Will keep you posted.
@camilo-s you might be in the same boat?
Hi,
We're also facing the same issue with Databricks provider version >1.9.2
We run Terraform pipelines also from ADO Agents hosted on our AKS cluster (self-hosted agents). The cluster is assigned a User Assigned Identity with Subscription contributor and Databricks Account Admin role (through aad-pod-identity).
terraform {
required_providers {
databricks = {
source = "databricks/databricks"
version = "1.14.3"
}
}
required_version = ">= 1.3.0"
}
provider "databricks" {
host = "https://accounts.azuredatabricks.net"
account_id = "<DATABRICKS_ACCOUNT_ID>"
azure_use_msi = true
# auth_type = "azure-msi"
}
data "databricks_user" "example" {
user_name = "example_user"
}
output "test" {
value = data.databricks_user.example.id
}
Error:
β Error: default auth: azure-cli: cannot get access token: ERROR: Please run 'az login' to setup account.
β . Config: host=https://accounts.azuredatabricks.net, account_id=<DATABRICKS_ACCOUNT_ID>, azure_use_msi=true
β
β with data.databricks_user.example,
β on user.tf line 1, in data "databricks_user" "example":
β 1: data "databricks_user" "example" {
and if I uncomment auth_type = "azure-msi"
, error:
β Error: default auth: cannot configure default credentials. Config: host=https://accounts.azuredatabricks.net, account_id=<DATABRICKS_ACCOUNT_ID>, azure_use_msi=true
β
β with data.databricks_user.hung,
β on user.tf line 1, in data "databricks_user" "example":
β 1: data "databricks_user" "example" {
β
@hungnguyen10897 These looks to be two fairly different problems
I would recommend opening a separate bug report In my example I am confident that I MSI shouldn't of ever worked vs you where it should work and our error messages are fairly different.
@nfx I am going to close this, as I have realised that my Microsoft hosted agent was always using client secret rather than MSI.
auth_type = "azure-client-secret"
But it was good to discover why the SDK "broke" the upgrade.
Configuration
Expected Behavior
prior to 1.10.0 (1.9.2 for example) it appears that you are able to reference groups at the account level. In my current example I have a group metastore admins created that I want to be metastore owner. This var is passed to the resource but the plan fails with the below error.
Actual Behavior
Since 1.10.0 this has failed, note that this example was basically a slight modification of the examples from the documentation.
Steps to Reproduce
Terraform and provider versions
Azure and Terraform are the latest version
terraform 1.3.9 databricks 1.11.0 azureRM 3.45.0
Debug Output
-Ms-Routing-Request-Id: AUSTRALIAEAST:20230301T213219Z:c8fe5feb-d0d4-4d54-ab3e-c5ed02691976
{"properties":{"privateEndpointConnections":[{"id":"/subscriptions/deleted/resourceGroups/dev-disability-databricks-rg/providers/Microsoft.Databricks/workspaces/dev-disability-databricksResearchEnv/privateEndpointConnections/dev-disability-databricksResearchEnv-private-endpoint","name":"dev-disability-databricksResearchEnv-private-endpoint","type":"Microsoft.Databricks/workspaces/privateEndpointConnections","properties":{"privateEndpoint":{"id":"/subscriptions/deleted/resourceGroups/dev-disability-databricks-rg/providers/Microsoft.Network/privateEndpoints/dev-disability-databricksResearchEnv-private-endpoint"},"groupIds":["databricks_ui_api"],"privateLinkServiceConnectionState":{"status":"Approved","description":"Auto-approved","actionsRequired":"None"},"provisioningState":"Succeeded"}}],"publicNetworkAccess":"Enabled","requiredNsgRules":"NoAzureDatabricksRules","managedResourceGroupId":"/subscriptions/deleted/resourceGroups/dev-disability-databricksResearchEnv-rg","parameters":{"customPrivateSubnetName":{"type":"String","value":"DatabricksProductSubnetPrivate"},"customPublicSubnetName":{"type":"String","value":"DatabricksProductSubnetPublic"},"customVirtualNetworkId":{"type":"String","value":"/subscriptions/deleted/resourceGroups/dev-network-rg/providers/Microsoft.Network/virtualNetworks/dev-vnet"},"enableFedRampCertification":{"type":"Bool","value":false},"enableNoPublicIp":{"type":"Bool","value":true},"natGatewayName":{"type":"String","value":"nat-gateway"},"prepareEncryption":{"type":"Bool","value":false},"publicIpName":{"type":"String","value":"nat-gw-public-ip"},"requireInfrastructureEncryption":{"type":"Bool","value":false},"resourceTags":{"type":"Object","value":{"application":"databricks","databricks-environment":"true","Owner":"Data Management and Analytics ","Project":"Data Management and Analytics","Environment":"dev","Name":"dev"}},"storageAccountName":{"type":"String","value":"dbstoragef5n4m3fvilzcu"},"storageAccountSkuName":{"type":"String","value":"Standard_GRS"},"vnetAddressPrefix":{"type":"String","value":"10.139"}},"provisioningState":"Succeeded","authorizations":[{"principalId":"9a74af6f-d153-4348-988a-e2672920bee9","roleDefinitionId":"8e3af657-a8ff-443c-a75c-2fe8c4bcb635"}],"createdBy":{"oid":"5ac85ca7-2fde-4827-a661-f9a93ae6b516","applicationId":"a5e17c8e-f882-4e04-bd42-64c16af26df8"},"updatedBy":{"oid":"5ac85ca7-2fde-4827-a661-f9a93ae6b516","applicationId":"a5e17c8e-f882-4e04-bd42-64c16af26df8"},"workspaceId":"1353120338516096","workspaceUrl":"adb-1353120338516096.16.azuredatabricks.net","createdDateTime":"2023-01-23T07:43:34.4083755Z"},"id":"/subscriptions/deleted/resourceGroups/dev-disability-databricks-rg/providers/Microsoft.Databricks/workspaces/dev-disability-databricksResearchEnv","name":"dev-disability-databricksResearchEnv","type":"Microsoft.Databricks/workspaces","sku":{"name":"premium"},"location":"australiaeast","tags":{"Owner":"Data Management and Analytics ","Project":"Data Management and Analytics","Environment":"dev","Name":"dev"}}: timestamp=2023-03-01T21:32:19.690Z [0m[1mdata.azurerm_databricks_workspace.default: Read complete after 1s [id=/subscriptions/deleted/resourceGroups/dev-disability-databricks-rg/providers/Microsoft.Databricks/workspaces/dev-disability-databricksResearchEnv][0m 2023-03-01T21:32:19.693Z [DEBUG] created provider logger: level=debug 2023-03-01T21:32:19.693Z [INFO] provider: configuring client automatic mTLS 2023-03-01T21:32:19.705Z [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/databricks/databricks/1.11.0/linux_amd64/terraform-provider-databricks_v1.11.0 args=[.terraform/providers/registry.terraform.io/databricks/databricks/1.11.0/linux_amd64/terraform-provider-databricks_v1.11.0] 2023-03-01T21:32:19.705Z [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/databricks/databricks/1.11.0/linux_amd64/terraform-provider-databricks_v1.11.0 pid=1784 2023-03-01T21:32:19.705Z [DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.terraform.io/databricks/databricks/1.11.0/linux_amd64/terraform-provider-databricks_v1.11.0 2023-03-01T21:32:19.717Z [DEBUG] provider.terraform-provider-databricks_v1.11.0: Databricks Terraform Provider 2023-03-01T21:32:19.717Z [DEBUG] provider.terraform-provider-databricks_v1.11.0: 2023-03-01T21:32:19.717Z [DEBUG] provider.terraform-provider-databricks_v1.11.0: Version 1.11.0 2023-03-01T21:32:19.717Z [DEBUG] provider.terraform-provider-databricks_v1.11.0: 2023-03-01T21:32:19.717Z [DEBUG] provider.terraform-provider-databricks_v1.11.0: https://registry.terraform.io/providers/databricks/databricks/latest/docs 2023-03-01T21:32:19.717Z [DEBUG] provider.terraform-provider-databricks_v1.11.0: 2023-03-01T21:32:19.720Z [INFO] provider.terraform-provider-databricks_v1.11.0: configuring server automatic mTLS: timestamp=2023-03-01T21:32:19.718Z 2023-03-01T21:32:19.757Z [DEBUG] provider.terraform-provider-databricks_v1.11.0: plugin address: address=/tmp/plugin2621197164 network=unix timestamp=2023-03-01T21:32:19.757Z 2023-03-01T21:32:19.758Z [DEBUG] provider: using plugin: version=5 2023-03-01T21:32:19.795Z [WARN] ValidateProviderConfig from "provider[\"registry.terraform.io/databricks/databricks\"]" changed the config value, but that value is unused 2023-03-01T21:32:19.803Z [INFO] provider.terraform-provider-databricks_v1.11.0: Explicit and implicit attributes: azure_client_id, azure_client_secret, azure_tenant_id, azure_workspace_resource_id, host: timestamp=2023-03-01T21:32:19.802Z 2023-03-01T21:32:19.811Z [INFO] ReferenceTransformer: reference not found: "var.METASTORE_OWNER" 2023-03-01T21:32:19.811Z [INFO] ReferenceTransformer: reference not found: "var.METASTORE_NAME" 2023-03-01T21:32:19.811Z [INFO] ReferenceTransformer: reference not found: "var.METASTORE_STORAGE_ROOT" 2023-03-01T21:32:19.811Z [DEBUG] ReferenceTransformer: "module.assignmetastore.databricks_metastore.this" references: [] [0m[1mmodule.assignmetastore.databricks_metastore.this: Refreshing state... [id=473daebd-abc8-4989-9840-b959cd17a4d4][0m 2023-03-01T21:32:19.828Z [DEBUG] provider.terraform-provider-databricks_v1.11.0: Generating AAD token via Azure MSI: timestamp=2023-03-01T21:32:19.828Z 2023-03-01T21:32:19.845Z [ERROR] provider.terraform-provider-databricks_v1.11.0: Response contains error diagnostic: tf_rpc=ReadResource @caller=/home/runner/work/terraform-provider-databricks/terraform-provider-databricks/vendor/github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/diag/diagnostics.go:55 @module=sdk.proto diagnostic_severity=ERROR diagnostic_summary="cannot read metastore: inner token: token error: {"error":"invalid_request","error_description":"Identity not found"}" tf_resource_type=databricks_metastore diagnostic_detail= tf_proto_version=5.3 tf_provider_addr=registry.terraform.io/databricks/databricks tf_req_id=f96ff394-f9f8-4869-dff4-8e76998ea7aa timestamp=2023-03-01T21:32:19.845Z 2023-03-01T21:32:19.846Z [ERROR] vertex "module.assignmetastore.databricks_metastore.this" error: cannot read metastore: inner token: token error: {"error":"invalid_request","error_description":"Identity not found"} 2023-03-01T21:32:19.846Z [ERROR] vertex "module.assignmetastore.databricks_metastore.this (expand)" error: cannot read metastore: inner token: token error: {"error":"invalid_request","error_description":"Identity not found"} 2023-03-01T21:32:19.848Z [INFO] backend/local: plan operation completed [31mβ·[0m[0m [31mβ[0m [0m[1m[31mError: [0m[0m[1mcannot read metastore: inner token: token error: {"error":"invalid_request","error_description":"Identity not found"}[0m [31mβ[0m [0m [31mβ[0m [0m[0m with module.assignmetastore.databricks_metastore.this, [31mβ[0m [0m on modules/metastores/main.tf line 17, in resource "databricks_metastore" "this": [31mβ[0m [0m 17: resource "databricks_metastore" "this" [4m{[0m[0m [31mβ[0m [0m [31mβ΅[0m[0m
Important Factoids
running in AustraliaEast