databricks / cli

Databricks CLI
Other
150 stars 56 forks source link

Notifications and Trigger properties for Pipelines (Delta Live Table) are ignored #1175

Closed neox2811 closed 9 months ago

neox2811 commented 9 months ago

Describe the issue

Notifications and Trigger properties for Pipelines (Delta Live Table) are ignored

Configuration

bundle:
  name: etl_bundle

workspace:
  host: https://dbc-xxx.cloud.databricks.com

resources:
  pipelines:
    my-pipeline:
      name: my-pipeline
      target: dlt
      catalog: "my_catalog"

      libraries:
        - notebook:
            path: ../pipelines/mypipelines.ipynb

      clusters:
        - aws_attributes:
            instance_profile_arn: "${var.instance_profile}"
          autoscale:
            min_workers: 1
            max_workers: 2
      trigger:
        cron:
          quartz_cron_schedule: "0 0 7 ? * *"
          timezone_id: "UTC"

      notifications:
        email_recipients:
          - "my@mail.com"
        alerts:
          - "on-update-success"
          - "on-update-failure"
          - "on-update-fatal-failure"
          - "on-flow-failure"

Steps to reproduce the behavior

  1. Run databricks bundle deploy
  2. Go to Databricks UI, open Pipeline
  3. Check Pipeline Schedule / Settings

Expected Behavior

Pipeline Schedule and notifications are set like specified in the bundle

Actual Behavior

Pipeline Schedule and notifications are not set.

The output of databricks bundle validate also does not contain the schedule and notification settings. I also tried to execute the bundle in mode development and mode production.

OS and CLI version

Databricks CLI v0.212.3 macOS Sonoma 14.3

Is this a regression?

No

andrewnester commented 9 months ago

It seems to be cause by clusters configuration being incorrect, it should be a list of clusters, so should look like this (notice - before autoscale to make sure clusters is actually an YAML list)

      clusters:
        - autoscale:
            min_workers: 1
            max_workers: 2

Not having correct error message on failed parsing is a known issue and should be partially addressed with https://github.com/databricks/cli/pull/1098

neox2811 commented 9 months ago

I accidently remove the -while redacting. I updated my YAML above:

      clusters:
        - aws_attributes:
            instance_profile_arn: "${var.instance_profile}"
          autoscale:
            min_workers: 1
            max_workers: 2

Even with a list of clusters, the notification and trigger is not set. I also don`t get any error message while deploying.

andrewnester commented 9 months ago

as per API https://docs.databricks.com/api/workspace/pipelines/create notifications are also an array of objects, so the configuration for it should look like this (note - again before email_recipients)

      notifications:
        - email_recipients:
            - "my@mail.com"
          alerts:
            - "on-update-success"
            - "on-update-failure"
            - "on-update-fatal-failure"
            - "on-flow-failure"
neox2811 commented 9 months ago

Okay, thank you. Notifications are working now. But trigger still doesn`t:


    trigger:
      cron:
        quartz_cron_schedule: "0 0 1 ? * *"
        timezone_id: "UTC"

Am I missing something? Can I find examples of DABs pipelines with trigger anywhere?

andrewnester commented 9 months ago

trigger option is being deprecated and replaced by continuous, see API docs

https://docs.databricks.com/api/workspace/pipelines/create#continuous

If you want a pipeline shedule, you can start a triggered pipeline manually or run the pipeline on a schedule with a Databricks job. You can create and schedule a job with a single pipeline task directly in the Delta Live Tables UI or add a pipeline task to a multi-task workflow in the jobs UI.

See an example here: https://github.com/databricks/bundle-examples/blob/200965dbfce0d1029898ec3586f8f26c9a01d704/default_python/resources/default_python_job.yml#L7-L26