databricks / cli

Databricks CLI
Other
136 stars 53 forks source link

storage-credential: creation on azure utilizing a service principle fails without error message #1108

Closed davidzenisu closed 2 months ago

davidzenisu commented 9 months ago

Describe the issue

Using an Azure Service Principal for authentication (as documented here) to create a storage-credential fails error code 500 and without any error message (see debug logs further down below):

databricks storage-credentials create --json @createCredential.json --profile SP
Error:

The same problem occurs when trying to roll out the storage credential using terraform.

Seems to be a similar situation with service principal credentials (but a different way of authentication): https://github.com/databricks/cli/issues/1080 https://github.com/databricks/terraform-provider-databricks/issues/3022

Steps to reproduce the behavior

  1. Configure a databricks profile called AZURESP authenticating with a service principal client & secret (as documented here)
  2. Run databricks storage-credentials create --json '<json_content> --profile AZURESP trying to create a storage-credential
  3. The CLI prints Error: without any additional message.

Expected Behavior

Storage credential is created successfully.

Actual Behavior

Return code 500 and no error message.

OS and CLI version

Please include the version of the CLI (eg: v0.1.2) and the operating system (eg: windows). You can run databricks --version to get the version of your Databricks CLI Databricks CLI v0.210.2 Linux (WSL 2, Ubuntu 22.04.2 LTS)

Is this a regression?

No.

Debug Logs

databricks storage-credentials create --json @createCredential.json --debug --profile SP
11:57:39  INFO start pid=56195 version=0.211.0 args="databricks, storage-credentials, create, --json, @createCredential.json, --debug, --profile, SP"
11:57:39 DEBUG Loading SP profile from /home/davidhoferzeni/.databrickscfg pid=56195 sdk=true
11:57:39  INFO Generating AAD token for Service Principal (**REDACTED**) pid=56195 sdk=true
11:57:39 DEBUG POST /*REDACTED**/oauth2/token
> [non-JSON document of 19 bytes]. <http.RoundTripper>
< HTTP/1.1 200 OK
< {
<   "access_token": "**REDACTED**",
<   "expires_in": "3599",
<   "expires_on": "1704801460",
<   "ext_expires_in": "3599",
<   "not_before": "1704797560",
<   "resource": "*REDACTED**",
<   "token_type": "Bearer"
< } pid=56195 sdk=true
11:57:40 DEBUG POST /*REDACTED**/oauth2/token
> [non-JSON document of 19 bytes]. <http.RoundTripper>
< HTTP/1.1 200 OK
< {
<   "access_token": "**REDACTED**",
<   "expires_in": "3599",
<   "expires_on": "1704801460",
<   "ext_expires_in": "3599",
<   "not_before": "1704797560",
<   "resource": "https://management.core.windows.net/",
<   "token_type": "Bearer"
< } pid=56195 sdk=true
11:57:41 DEBUG non-retriable error:  pid=56195 sdk=true
11:57:41 DEBUG POST /api/2.1/unity-catalog/storage-credentials
> {
>   "azure_managed_identity": {
>     "access_connector_id": "/subscriptions/*REDACTED**/resourceGroups/*REDACTED**/... (57 more bytes)"
>   },
>   "comment": "test",
>   "name": "test2",
>   "read_only": false,
>   "skip_validation": true
> }
< HTTP/2.0 500 Internal Server Error
< {
<   "details": [
<     {
<       "@type": "type.googleapis.com/google.rpc.RequestInfo",
<       "request_id": "*REDACTED**",
<       "serving_data": ""
<     }
<   ],
<   "error_code": "INTERNAL_ERROR",
<   "message": ""
< } pid=56195 sdk=true
Error: 
11:57:41 ERROR failed execution pid=56195 exit_code=1 error=
hargut commented 9 months ago

could be related to https://github.com/databricks/cli/issues/1080

hargut commented 9 months ago

@davidhoferzeni @andrewnester

Looks like the error message changed, and now provides a direction what the problem might be:

< HTTP/2.0 403 Forbidden
< {
<   "details": [
<     {
<       "@type": "type.googleapis.com/google.rpc.RequestInfo",
<       "request_id": "<request-id>",
<       "serving_data": ""
<     }
<   ],
<   "error_code": "PERMISSION_DENIED",
<   "message": "AAD Token exchange using Azure Managed Identity Credential with Access Credential Id <some-id>... (33 more bytes)"

The info within message is not perfect but with PERMISSION_DENIED it points in the correct direction. Adding CREATE_CONNECTION with databricks grants update metastore fixed the issue for me.

The service principal is now able to create the connection successfully.

davidzenisu commented 8 months ago

@hargut

Thanks a lot for the update, I will retry today and report my findings!

davidzenisu commented 8 months ago

@hargut

I've retried with the most current databricks CLI version (v0.212.1). Unfortunately, I still get the same result (as in my original debug logs):

"error_code": "INTERNAL_ERROR",
"message": ""

Also, if I configure the service principal as a "Account admin" creating the storage credential works without any problems, so I'm assuming it is related to a configuration issue on workspace and/catalog level (and not related to something Azure specific).

In any case, thanks a lot for the support, I'll try to bring it up in an upcoming sessions with the developer directly!

hargut commented 8 months ago

@davidhoferzeni Account Admin will likely not work, this is a Unity Catalog / Metastore permission. It only started working for me after adding the permission described above. Not sure if Account Admin covers the metastore admin for all metastores, if the metastore is created manually the default metastore admin is only the user that created it.

andrewnester commented 2 months ago

@davidzenisu does the issue still persist for you in the latest CLI?

andrewnester commented 2 months ago

Closing as no response, feel free to reopen if the issue persists

davidzenisu commented 1 month ago

Since I don't have access to my original testing setup anymore, I'm currently unable to verify if the issue persists. Thanks for updating the issue, I will create a separate case if I gain additional insights.