databrickslabs / dbx

🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.
https://dbx.readthedocs.io
Other
443 stars 122 forks source link

Using the same workspace directory between different environments #862

Open WmWessels opened 9 months ago

WmWessels commented 9 months ago

Expected Behavior

I would like to create two environments (in .dbx/project.json). Here, I want to have the same workspace directory in both environments, but use different artifact locations.

Current Behavior

When I deploy my python project using dbx in our CICD pipeline, I get an exception. The exception I get is this:

Exception: Required location of experiment /Shared/dbx/ doesn't match the project defined one.

Steps to Reproduce (for bugs)

Create a dbx project. In the project.json, there should be two different environments. The workspace directory should be the same, but the artifact location should be different.

Then, create two deployment files (one for training, one for scoring). In the first deployment file, we create a workflow using the first environment. In the second deployment file, we create a workflow using the second environment.

finally:

Context

We want to version our ML code in production. We currently have a training workflow and a scoring workflow (training workflow stores the trained models, scoring refers to these models). As such, we would like the training workflow and scoring workflow to use the same workspace directory. However, we also want to use different artifact locations, such that we can version our code and not have the training/scoring workflows use the same code version.

How would I need to structure my project.json in order to get this to work?

Your Environment