databricks / cli

Databricks CLI
Other
148 stars 56 forks source link

Databricks CLI authenticates with azure-cli, but bundle deployment does not #1722

Closed Pim-Mostert closed 2 months ago

Pim-Mostert commented 2 months ago

Describe the issue

I want to deploy a Databricks Asset Bundle from an Azure DevOps Pipeline using databricks. While authentication seems to work fine when using cli commands (such as databricks experiments list-experiments), authentication fails for bundle deployment databricks bundle deploy.

In the pipeline I'm making use of the AzureCLI task, which enables databricks CLI to make use of azure-cli type authentication.

As mentioned in https://github.com/databricks/databricks-sdk-go/issues/1025#issuecomment-2312280494 the issue appears to be:

The issue that CLI authenticates with azure-cli type but bundles failed to do so is separate one and might be related to some miss on bundles side where we don't pass all necessary env variables. If this is an issue for you, please feel free to open a separate ticket for this in Databricks CLI repo.

Configuration

# azure-pipelines.yml
variables:
  databricksHost: "https://adb-XXX.azuredatabricks.net"

pool:
  vmImage: "ubuntu-latest"

jobs:
  - job: databricks_asset_bundle
    displayName: "Deploy Databricks Asset Bundle"
    steps:
      - bash: |
          # Install Databricks CLI - see https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ci-cd-azure-devops
          curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

          # Verify installation
          databricks --version

          # Create databricks config file
          file="~/.databrickscfg"

          if [ -f "$file" ] ; then
              rm "$file"
          fi        

          echo "[DEFAULT]" >> ~/.databrickscfg
          echo "host = $databricksHost" >> ~/.databrickscfg
        displayName: Setup Databricks CLI
      - task: AzureCLI@2
        displayName: Deploy Asset Bundle
        inputs:
          azureSubscription: "my-workload-identity-federation-service-connection"
          addSpnToEnvironment: true
          scriptType: "bash"
          scriptLocation: "inlineScript"
          inlineScript: |
            # As described in https://devblogs.microsoft.com/devops/public-preview-of-workload-identity-federation-for-azure-pipelines/
            export ARM_CLIENT_ID=$servicePrincipalId
            export ARM_OIDC_TOKEN=$idToken
            export ARM_TENANT_ID=$tenantId
            export ARM_SUBSCRIPTION_ID=$(az account show --query id -o tsv)
            export ARM_USE_OIDC=true

            # Databricks authentication itself works fine
            echo ------------- List experiments -------------
            databricks experiments list-experiments

            # But bundle deployment does not
            echo ------------- Deploy bundle -------------
            databricks bundle deploy --log-level=debug --target dev
# databricks.yml
bundle:
  name: my_project

variables:
  service_principle:
    description: Service principle used by the DevOps agent
    default: my-service-principle-id

run_as:
  service_principal_name: ${var.service_principle}

# Example resources to deploy
resources:
  experiments:
    my_experiment:
      name: "/Workspace/Users/${var.service_principle}/my_experiment"

targets:
  dev:
    mode: production
    default: true
    workspace:
      host: https://adb-XXX.azuredatabricks.net

Steps to reproduce the behavior

  1. Create a DevOps service connection with Workflow Identity Federation
  2. Create an Azure Pipeline with above yml (replace placeholders), using the service connection from 1)
  3. Create Databricks Asset Bundle with above above yml (replace placeholders)
  4. Trigger pipeline
  5. Observe error

Expected Behavior

The deployment of the asset bundle should succeed.

Actual Behavior

------------- Deploy bundle -------------
2024/08/27 08:40:59 [DEBUG] GET https://releases.hashicorp.com/terraform/1.5.5/index.json
2024/08/27 08:40:59 [DEBUG] GET https://releases.hashicorp.com/terraform/1.5.5/terraform_1.5.5_SHA256SUMS.72D7468F.sig
2024/08/27 08:40:59 [DEBUG] GET https://releases.hashicorp.com/terraform/1.5.5/terraform_1.5.5_SHA256SUMS
2024/08/27 08:40:59 [DEBUG] GET https://releases.hashicorp.com/terraform/1.5.5/terraform_1.5.5_linux_amd64.zip
Uploading bundle files to /Users/***/.bundle/my_project/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!
Error: terraform apply: exit status 1

Error: cannot create mlflow experiment: failed during request visitor: default auth: azure-cli: cannot get access token: ERROR: Please run 'az login' to setup account.
. Config: host=https://adb-XXX.azuredatabricks.net,/ azure_client_id=***, azure_tenant_id=XXX. Env: DATABRICKS_HOST, ARM_CLIENT_ID, ARM_TENANT_ID

  with databricks_mlflow_experiment.main,
  on bundle.tf.json line 17, in resource.databricks_mlflow_experiment.main:
  17:       }

Note that the listing of experiments works fine:

------------- List experiments -------------
[
   (expected list of experiments, redacted)
  {
      ...
  },
  ...
]

OS and CLI version

Output by the Azure pipeline:

azure-cli                         2.63.0

core                              2.63.0
telemetry                          1.1.0

Extensions:
azure-devops                       1.0.1

Dependencies:
msal                              1.30.0
azure-mgmt-resource               23.1.1

Databricks CLI: v0.227.0

OS: Ubuntu (Microsoft-hosted agent, latest version)

Is this a regression?

I don't know, I'm new to Databricks.

Debug Logs

Output databricks experiments list-experiments --log-level TRACE:
experiment-list.txt Output databricks bundle deploy --log-level=debug --target dev: bundle-deploy.txt

pietern commented 2 months ago

Chiming in as I ran into the same thing a few weeks ago.

The culprit is the Azure CLI configuration file location. We currently don't forward the AZURE_CONFIG_FILE environment variable, which the AzureCLI@2 task sets (perhaps for isolation, but I don't know for sure). To work around this, you can set useGlobalConfig and it will use the default configuration file location, and the Azure CLI will always find it:

- task: AzureCLI@2
  inputs:
    # ...
    useGlobalConfig: true
    # ...
Pim-Mostert commented 2 months ago

@pietern That indeed works for me too, thanks!

For reference, here is my full working configuration:

variables:
  databricksHost: "https://adb-XXX.azuredatabricks.net"

pool:
  vmImage: "ubuntu-latest"

jobs:
  - job: databricks_asset_bundle
    displayName: "Deploy Databricks Asset Bundle"
    steps:
      - bash: |
          # Install Databricks CLI - see https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ci-cd-azure-devops
          curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

          # Verify installation
          databricks --version

          # Create databricks config file
          file="~/.databrickscfg"

          if [ -f "$file" ] ; then
              rm "$file"
          fi        

          echo "[DEFAULT]" >> ~/.databrickscfg
          echo "host = $databricksHost" >> ~/.databrickscfg
        displayName: Setup Databricks CLI
      - task: AzureCLI@2
        displayName: Deploy Asset Bundle
        inputs:
          azureSubscription: "my-wif-serviceconnection"
          useGlobalConfig: true
          scriptType: "bash"
          scriptLocation: "inlineScript"
          inlineScript: |
            databricks bundle deploy --target dev
pabtorres commented 1 month ago

@pietern That indeed works for me too, thanks!

For reference, here is my full working configuration:

variables:
  databricksHost: "https://adb-XXX.azuredatabricks.net"

pool:
  vmImage: "ubuntu-latest"

jobs:
  - job: databricks_asset_bundle
    displayName: "Deploy Databricks Asset Bundle"
    steps:
      - bash: |
          # Install Databricks CLI - see https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ci-cd-azure-devops
          curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

          # Verify installation
          databricks --version

          # Create databricks config file
          file="~/.databrickscfg"

          if [ -f "$file" ] ; then
              rm "$file"
          fi        

          echo "[DEFAULT]" >> ~/.databrickscfg
          echo "host = $databricksHost" >> ~/.databrickscfg
        displayName: Setup Databricks CLI
      - task: AzureCLI@2
        displayName: Deploy Asset Bundle
        inputs:
          azureSubscription: "my-wif-serviceconnection"
          useGlobalConfig: true
          scriptType: "bash"
          scriptLocation: "inlineScript"
          inlineScript: |
            databricks bundle deploy --target dev

Hello, @Pim-Mostert, In your:

echo "host = $databricksHost" >> ~/.databrickscfg

Did you add the host, client_id and client_secret of the Service Principal?

Pim-Mostert commented 1 month ago

@pabtorres I only added the host. The necessary credentials are injected under the hood by the AzureCLI task.