databricks / cli

Databricks CLI
Other
150 stars 56 forks source link

Inconsistent results from root vs. nested 'fs ls' output on Azure Datalake Gen2 External Location #1451

Closed afedinhsol closed 4 months ago

afedinhsol commented 6 months ago

Describe the issue

There appears to be inconsistent behavior and/or an undocumented difference in how the databricks cli fs ls command operates against an External Location configured with an Azure Datalake Gen2 Storage Account.

This process was tested, using azure-cli authentication and databricks access token methods. The Azure Databricks Cluster allows Public access and has 'npip' set to false.

The external location is mapped to an Azure Datalake Gen2 Storage Account. It's mapped to a Databricks Access Connector, that has 'Storage Blob Data Contributor' permissions on the Storage Account. The firewall on the Azure Storage Account is set to 'Enabled from selected virtual networks and IP addresses' and has a whitelist for the 10 subnets listed in the same region network connectivity configuration.

Steps to reproduce the behavior

Please list the steps required to reproduce the issue, for example:

  1. Run databricks -p AUGUSTUSDV1 fs ls dbfs:/mnt --debug (Access Token Authentication) 1a. Succeeds
  2. Run databricks -p AUGUSTUSDV1 fs ls dbfs:/mnt/DELTA --debug (Access Token Authentication) 2a. Fails with invalid client secret provided error
  3. Run databricks -p AUGUSTUSDV fs ls dbfs:/mnt --debug (Access CLI Authentication) 3a. Succeeds
  4. Run databricks -p AUGUSTUSDV fs ls dbfs:/mnt/DELTA --debug (Access CLI Authentication) 4a. Fails with invalid client secret provided error

Expected Behavior

Expect all four commands above to be successful.

Actual Behavior

Root fs ls commands succeed, while nested directories fs ls commands fail.

OS and CLI version

Databricks CLI v0.219.0 MacOS (Sonoma 14.4.1)

Is this a regression?

Unknown (Only version attempted)

Debug Logs

.databrickscfg setup

[AUGUSTUSDV]
host      = {ADB URL}
auth_type = azure-cli

[AUGUSTUSDV1]
host      = {ADB URL}
token     = {Personal Databricks Token}

Succeeding

databricks -p AUGUSTUSDV1 fs ls dbfs:/mnt --debug      
12:57:23  INFO start pid=42671 version=0.219.0 args="databricks, -p, AUGUSTUSDV1, fs, ls, dbfs:/mnt, --debug"
12:57:23 DEBUG Loading AUGUSTUSDV1 profile from /Users/user/.databrickscfg pid=42671 sdk=true
12:57:25 DEBUG GET /api/2.0/dbfs/list?path=/mnt
< HTTP/2.0 200 OK
...
<   ]
< } pid=42671 sdk=true

Failing

databricks -p AUGUSTUSDV1 fs ls dbfs:/mnt/DELTA --debug
12:53:03  INFO start pid=42542 version=0.219.0 args="databricks, -p, AUGUSTUSDV1, fs, ls, dbfs:/mnt/DELTA, --debug"
12:53:03 DEBUG Loading AUGUSTUSDV1 profile from /Users/user/.databrickscfg pid=42542 sdk=true
12:53:04 DEBUG non-retriable error: HTTP Error 401; url='https://login.microsoftonline.com/{Azure Tenant ID}/oauth2/token' AADToken: HTTP connection to https://login.microsoftonline.com/{Azure Tenant ID}/oauth2/token failed for getting token from AzureAD.; requestId='X'; contentType='application/json; charset=utf-8'; response '{"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app '{Access Connector App ID}'. Trace ID: X Correlation ID: 0X Timestamp: 2024-05-24 16:53:04Z","error_codes":[7000215],"timestamp":"2024-05-24 16:53:04Z","trace_id":"X","correlation_id":"X","error_uri":"https://login.microsoftonline.com/error?code=7000215"}' pid=42542 sdk=true
12:53:04 DEBUG GET /api/2.0/dbfs/list?path=/mnt/DELTA
< HTTP/2.0 400 Bad Request
< {
<   "error_code": "IO_ERROR",
<   "message": "HTTP Error 401; url='https://login.microsoftonline.com/{Azure Tenant ID}/oaut... (894 more bytes)"
< } pid=42542 sdk=true
Error: HTTP Error 401; url='https://login.microsoftonline.com/{Azure Tenant ID}/oauth2/token' AADToken: HTTP connection to https://login.microsoftonline.com/{Azure Tenant ID}/oauth2/token failed for getting token from AzureAD.; requestId='X'; contentType='application/json; charset=utf-8'; response '{"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app '{Access Connector App ID}'. Trace ID: X Correlation ID: X Timestamp: 2024-05-24 16:53:04Z","error_codes":[7000215],"timestamp":"2024-05-24 16:53:04Z","trace_id":"X","correlation_id":"X","error_uri":"https://login.microsoftonline.com/error?code=7000215"}'
12:53:04 ERROR failed execution pid=42542 exit_code=1 error="HTTP Error 401; url='https://login.microsoftonline.com/{Azure Tenant ID}/oauth2/token' AADToken: HTTP connection to https://login.microsoftonline.com/{Azure Tenant ID}/oauth2/token failed for getting token from AzureAD.; requestId='X'; contentType='application/json; charset=utf-8'; response '{\"error\":\"invalid_client\",\"error_description\":\"AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app '{Access Connector App ID}'. Trace ID: X Correlation ID: X Timestamp: 2024-05-24 16:53:04Z\",\"error_codes\":[7000215],\"timestamp\":\"2024-05-24 16:53:04Z\",\"trace_id\":\"X\",\"correlation_id\":\"X\",\"error_uri\":\"https://login.microsoftonline.com/error?code=7000215\"}'"
pietern commented 6 months ago

Thanks for reporting. I believe this is working as intended. DBFS mounts are not accessible through the REST API.

The path /mnt is not a mount itself and contains the mount points (if I'm not mistaken).

The path /mnt/DELTA is probably a mount point.

The returned error is unfortunate, though.

afedinhsol commented 6 months ago

Thanks @pietern. We actually determined internally that this was a mistaken authorization error and not a connectivity error. This does work on DBFS mount points, when the External Location auth. is setup correctly.

To the point of the error message: we think it's an Expired Secret on the Azure Service Principal...which is different than the error given, which implies a formatting or REST API issue.

If you want to use this to put in a small enhancement to this error messaging around the REST API, you can use this ticket. Otherwise: I think I have my question answered and this issue can be closed.

Thanks again.