databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
226 stars 119 forks source link

Adding new constraint logic that will be used with V2 flag #846

Open benc-db opened 2 days ago

benc-db commented 2 days ago

Description

Adds new constraint logic for use with V2 materialization flag. Not currently hooked up to anything (as that will come when I send the PR that splits table materialization into V1 and V2).

Core ideas: 1.) Use constraint config from dbt (as opposed to our homebrew from the 'meta' dictionary) 2.) Process constraints almost entirely in python since they are so logic heavy (this matches what dbt-core is doing anyway) 3.) Mirror but don't depend on dbt built-in functions for accomplishing this (as they expand the surface of adapter, and this logic has no dependency on the Adapter class). 4.) For processing constraint logic, use a functional style since it is pretty much entirely data transformation. 5.) Store processed constraints on Columns or Relations (since Databricks has timing requirements, namely that a.) not null is a column property (not a proper constraint) and b.) that check constraints can only be added via an alter. 6.) Following a functional style, when we 'enrich' an object, return a new copy. This may need to switch to mutable in the future, depending on whether we need these changes cached or not (the relation cache does not have an easy to use update mechanism).

While writing this description, I'm realizing I'm missing unit tests for model constraints, so I will add those prior to merge.

Checklist

benc-db commented 2 days ago

@alexguo-db since this PR is pure python, I figured it would be a reasonable introduction to dbt PRs. Context is that I'm redoing how we handle constraints from being in SQL templates to being in python. This PR does not include any calls to the new functions; those will come from SQL templates but will be in a separate PR.

kmarq commented 14 hours ago

Is there any ability that this will be able to apply constraints without requiring the full contract enforcement? We've been looking to be able to define specific column constraints and PK/FK without having to define the full contract of columns. You'd have to define any column that you want constraints on, but any other columns would be merged in at runtime. I've looked into trying to do this via a posthook, but from what I'm seeing if contract enforced=false then the constraints are not even visible by the time you are inside a model.

benc-db commented 13 hours ago

Is there any ability that this will be able to apply constraints without requiring the full contract enforcement? We've been looking to be able to define specific column constraints and PK/FK without having to define the full contract of columns. You'd have to define any column that you want constraints on, but any other columns would be merged in at runtime. I've looked into trying to do this via a posthook, but from what I'm seeing if contract enforced=false then the constraints are not even visible by the time you are inside a model.

It's not in this PR, but the V2 materialization approach basically makes a temp view and a final table that the temp view gets merged into. This gives us a number of benefits, including that you don't need to specify contract enforced/column types in your model in order for us to have the capability of adding constraints to individual columns.