databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
195 stars 104 forks source link

Support Delta Live Tables Expectations #634

Open stevenayers-bge opened 2 months ago

stevenayers-bge commented 2 months ago

Describe the feature

Add support for DLT Expectations: Manage data quality with Delta Live Tables

These could utilize the constraints resource property for syntax:


models:
  - name: lendingclub_clean
    materialized: streaming_table
    constraints:
      - name: expectation_1
        type: expectation
        expression: (avg_cur_bal >= 0) ON VIOLATION DROP ROW
    columns:
      - name: avg_cur_bal
        data_type: int

becomes:

CREATE LIVE TABLE lendingclub_clean(
  CONSTRAINT expectation_1 EXPECT (avg_cur_bal >= 0) ON VIOLATION DROP ROW
)

Describe alternatives you've considered

I use normal tables at the moment with ordinary DBT tests, but as the ecosystem grows around DLT it's starting to provide some critical Data Quality capabilities such as Lakehouse Monitoring which I'd love to be able to take advantage of.

Additional context

If this is already supported, please let me know. I couldn't find anything on the topic.

Who will this benefit?

It would give people who are using DBT on Databricks a huge incentive to start using DLT.

Are you interested in contributing this feature?

Yes

benc-db commented 2 months ago

It's not supported yet, but it's something we're strongly considering to add this year. Thanks for filing the ticket.

stevenayers-bge commented 2 months ago

It's not supported yet, but it's something we're strongly considering to add this year. Thanks for filing the ticket.

Thanks @benc-db. Is APPLY CHANGES INTO supported yet? Same again for that, happy to contribute if it isn't or hasn't been started yet.

benc-db commented 2 months ago

This is the first that APPLY CHANGES INTO is on my radar; will share with my managers. If you'd like to contribute to anything MV/ST related, make sure you start from 2.0.latest (release coming in May), as that has significant changes to MV/ST stuff in it.

rodrigorabioglio commented 2 weeks ago

Dropping by to leave a +1 on supporting APPLY CHANGES INTO on the adapter.

For context, we use one DLT pipeline for ingesting raw data and on top of the generated ST, we use another DLT pipe to APPLY CHANGES and process CDC updates into a "bronze" layer. This second pipe also needs to perform a set of type casting, column renamings and bucketing.

Nowadays, the APPLY CHANGES and the second set of transforms live in different places. The former is a dlt pipeline job and the latter is done on dbt. Having everything together on dbt would be awesome, much easier to maintain :)