Nike-Inc / brickflow

Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
https://engineering.nike.com/brickflow/
Apache License 2.0
185 stars 39 forks source link

[FEATURE] Please add DLT support #40

Open deepuak opened 1 year ago

deepuak commented 1 year ago

Is your feature request related to a problem? Please describe. I would like to invoke Delta Live Table from brickflow

Cloud Information

Describe the solution you'd like Curently, DLT is deployed in databricks as a wheel file. I would like to deploy the same DLT wheel file using brickflow

Describe alternatives you've considered

Additional context

deepuak commented 1 year ago

Add to the above, i am facing some issues currently when using brickflow to deploy from my windows machine.

1. Deploy: i am using "brickflow projects deploy --project hello-world-brickflow -e local" and got below error:

Starting upload of bundle files Uploaded bundle files at /Users/drg.devops@clarivate.com/.brickflow_bundles/hello-world-brickflow/local/files! Starting resource deployment Error: terraform apply: exit status 1 Error: cannot create pipeline: storage path must be absolute with databricks_pipeline.test_hello_world, on bundle.tf.json line 349, in resource.databricks_pipeline.test_hello_world: 349: } Error: Command '['.databricks/bin/cli\0.203.0\databricks', 'bundle', 'deploy', '-e', 'hello-world-brickflow-local']' returned non-zero exit status 1.

Please note databricks CLI is configured and cluster_id is mentioned.

2. For now DLT spark_script is just a place holder file and if i need to add DLT module, i see that i am not able to import as part of brickflow. What should be the right approach here?

asingamaneni commented 1 year ago

@deepuak can you please search for "bundle.tf.json" in your local repository and share the line "349". My hunch is that for the DLT task, the path for the notebook is not being resolved properly. Can you help printing your repo structure and the path you gave for the DLT task.

deepuak commented 1 year ago

hi @asingamaneni

please find below "bundle.tf.json" line:

   "databricks_pipeline": {
      "test_hello_world": {
        "channel": "current",
        "development": true,
        "edition": "advanced",
        "name": "drg_devops_hello world",
        "storage": "123",
        "library": [
          {
            "notebook": {
              "path": "/Users/drg.devops@clarivate.com/.brickflow_bundles/hello-world-brickflow/local/files/scripts/spark_script_2"
            }
          }
        ]
      }
    }

Also please find below the repo structure:

image

Also i am not able to resolve dlt module. I was trying it from brickflow as below. Do i need to install dlt separately ?

image

asingamaneni commented 1 year ago

@deepuak The below worked for me. We don't have dlt in brickflow. Brickflow only supports for deployment of dlt pipelines. Takes the source code as notebooks and deploy them.

The below is sample code for dlt-pipeline in my local:

image

The below is sample code in my workflow:

image

In the DLTPipeline configuration, please provide target instead of storage

asingamaneni commented 12 months ago

@deepuak is the issue resolved?

deepuak commented 12 months ago

Hi @asingamaneni i will check.