databrickslabs / cicd-templates

Manage your Databricks deployments and CI with code.
Other
202 stars 100 forks source link

File not found when running integration tests on GitHub Actions #41

Closed kiranvasudev closed 3 years ago

kiranvasudev commented 3 years ago

By default, in the onpush workflow, integration tests are first deployed as a job on Databricks and then launched as a job. https://github.com/databrickslabs/cicd-templates/blob/5cf90b3e861dc851fdd8e2595d9dbb92eb3bd986/%7B%7Bcookiecutter.project_slug%7D%7D/.github/workflows/onpush.yml#L50-L56

What I would like to do here is to directly execute the integration test on a specific cluster without deploying the test as a job on Databricks and executing the job.

To achieve this, I have removed:

      - name: Deploy integration test
        run: |
          dbx deploy --jobs=<job_name>

      - name: Run integration test
        run: |
          dbx launch --job=<job_name> --trace

and added into the workflow file (onpush.yml):

      - name: Run integration test
        run: |
          dbx execute --cluster-id=<id> --job=<job_name> --requirements-file=unit-requirements.txt

When the GitHub action workflow is executed, it breaks at the Run integration test job, with the error:

FileNotFoundError: [Errno 2] No such file or directory: '.dbx/lock.json'

I understand that this file contains the execution context and hence is in the .gitignore.

As a result, is there a way that an integration test can be run without deploying it as a job on Databricks?

renardeinside commented 3 years ago

Hi @kiranvasudev ! There are two options to fix this issue.

dbx launch on interactive cluster

The first option is to still use dbx launch, but instead of describing new cluster props in conf/deployment.json provide the existing cluster id in the test job description:

"existing_cluster_id": "interactive-cluster-id"

This will still create or update the test job, but it will execute it on the given interactive cluster.

dbx execute with some fixes on CICD pipeline

The problem is that dbx expects the existence of .dbx/lock.json in the root directory of the project when you launch the execute command. I'll fix this issue separately, for now you can simply launch the following command in the same step before launching execute:

echo "{}" > .dbx/lock.json

it will create an empty file and it won't overlap with the execution context you're using locally.

kiranvasudev commented 3 years ago

Hi @renardeinside !

As you suggested, I had already created an empty file during execution of the GitHub workflow. I thought it was a super hacky way to do this, but thank you for assuring me that it is a valid temporary solution to the problem.

I will use this approach until there is a fix for this.

renardeinside commented 3 years ago

Issue fixed in 1.0.4. Thanks a lot for the feedback, @kiranvasudev