databricks / setup-cli

Sets up the Databricks CLI in your GitHub Actions workflow.
Other
9 stars 9 forks source link

Failing `databricks bundle destroy` without existing resources #105

Open JasperGrs opened 3 months ago

JasperGrs commented 3 months ago

A Github Actions job executing the databricks bundle destroy command fails when no databricks resources exist within the workspace due to missing terraform state file. This means that a pre-emptive execution of the command to guarantee a clean workspace (regarding asset bundle resources) before deploying new resources becomes impossible.

Note: The used version is @ main, which references to tag v0.219.0 when I came upon the issue. during writing the newer tag v0.220.0 got released. Some testing shows that the issue persists in this version

actions.yml file

(databricks bundle validate included to prove correct bundle configuration)

name: "Databricks asset bundle test"
concurrency: 1
on:
  push:
    branches:
      - CICD-POC

jobs:
  validate:
    name: "Validate asset bundle"
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3
      - uses: databricks/setup-cli@main
      - run: 
          databricks bundle validate --debug
        working-directory: ./
        env:
          DATABRICKS_TOKEN: ${{ secrets.***}}
          BUNDLE_VAR_SQL_WAREHOUSE_ID: ${{ secrets.***}}
          BUNDLE_VAR_COMPUTE_ID: ${{ secrets.***}}
          DATABRICKS_HOST: ${{ secrets.***}}

  delete:
    name: "destroy existing asset bundles items"
    runs-on: ubuntu-latest
    needs: 
      - validate
    steps:

      - uses: actions/checkout@v3
      - uses: databricks/setup-cli@main
      - run: 
          databricks bundle destroy --auto-approve
        working-directory: ./
        env:
          DATABRICKS_TOKEN: ${{ secrets.***}}
          BUNDLE_VAR_SQL_WAREHOUSE_ID: ${{ secrets.***}}
          BUNDLE_VAR_COMPUTE_ID: ${{ secrets.***}}
          DATABRICKS_HOST: ${{ secrets.***}

Result with existing resources (by locally running databricks bundle deploy before executing the Github actions workflow)

image

Result without existing resources

image

pietern commented 3 months ago

Thanks for reporting. This is a case we don't yet cover in our integration tests.

I'm curious though, when would you attempt to perform a destroy when there isn't anything there?

JasperGrs commented 3 months ago

Thanks for reporting. This is a case we don't yet cover in our integration tests.

I'm curious though, when would you attempt to perform a destroy when there isn't anything there?

Precaution mainly and to streamline automation flows within a large project. We would not want an old forgotten resource to be executed by accident and ruining our data transformation pipelines.

pietern commented 3 months ago

How would this be the case though?

Every bundle deployment tracks its resources, so when you repeatedly deploy the same bundle, those resources will be updated. If you remove a resource from the bundle definition, then that resource will be removed on the next deployment. If you add a resource to the bundle definition, then that resource will be included in the next deployment.

Resources can only be forgotten if you change 1) the bundle name, 2) the bundle target, or 3) the workspace. Even when you attempt an eager destroy prior to deploying, it won't have the desired effect if you change any of these.

JasperGrs commented 3 months ago

Admittedly, I slightly misunderstood the workings of the deploy and destroy commands, I thought they worked more like create and delete respectively, where the deploy would overwrite existing resources if they had the same name, yet kept the resources which are not defined within the bundle anymore. Therefore running a destroy command first would be required to make sure all resources are removed.

If you have a link to documentation about the inner working of the deploy and destroy commands I would be grateful, as the (azure) databricks asset bundle documentation did not seem to mention this.

However I can still imagine a simple use-case example: an automated cleaning job for our development environment bundle resources when merging the dev branch with main would fail if a developer might have locally triggered the destroy command earlier.

whatever the use case, I think it would be a logical addition that the destroy command succeeds when no resources were found in the bundle.

pietern commented 3 months ago

Thank you for this feedback.

I'm looking through Databricks Asset Bundles development workflow and while it does hint at this, it could be more clear.

To clarify here: the resource name is not used to correlate between a bundle deployment and a resource instance. A deployment tracks the resources it created by their IDs as state that is stored in the workspace file system. This means that existing resources can be updated, and previously deployed resources that have been removed from the configuration can be removed.

I agree that regardless of how all of this works, running a destroy on a bundle that wasn't previously deployed should not return an error, or there should be a flag to make it not return an error if nothing exists to destroy.