databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
195 stars 104 forks source link

Feature Request: Support Column Masking #670

Open stevenayers-bge opened 1 month ago

stevenayers-bge commented 1 month ago

Describe the feature

Support Column Masking as part of the YAML definition:

{% macro create_hash_function() %}
    CREATE OR REPLACE FUNCTION my_catalog.my_schema.hash_mask(raw_value STRING)
    RETURN sha2(raw_value, 256);
{% endmacro %}

on-run-start: "{{ create_hash_function() }}"

models:
  - name: foo
    description: I'm a teapot
    columns:
      - name: id
      - name: bank_details
        column_mask: my_catalog.my_schema.hash_mask

Describe alternatives you've considered

Doing this in a post-hook. It works fine but it's a bit messy,

Additional context

Please include any other relevant context here.

Who will this benefit?

Anyone who wants to define masking for their models

Are you interested in contributing this feature?

Yes

benc-db commented 1 month ago

We've been talking about this, and mostly we've been holding off until we can implement the full column spec, or, until Databricks allows you to specify advanced column features without specifying the column type; in the absence of that change, we can only add the mask with an alter statement, which seems a pretty weak implementation. However, if you wanted to work on a PR for this, I'd be happy to work with you on getting it into the adapter.