dbt-msft / tsql-utils

dbt-utils for the dbt-msft family of packages
MIT License
25 stars 25 forks source link

Added tsql_utils.surrogate_key() macro #32

Closed infused-kim closed 3 years ago

infused-kim commented 3 years ago

Here is a suggested attempt at addressing the two issues I have raised related to the default surrogate_key() implementation: https://github.com/dbt-msft/tsql-utils/issues/26 and https://github.com/dbt-msft/tsql-utils/issues/25

This creates a a separate tsql_utils.surrogate_key() macro in addition to the default dbt_utils.surrogate_key() macro that allows for more t-sql specific customization.

This macro adds three additional features:

1. Field column type

It allows you to customize the column type into which the fields are cast. One use case for this is if you have nvarcharcolumns that you want to generate a surrogate key for. Since dbt_utils.surrogate_key() by default casts them into varchar, it's possible to get duplicate surrogate keys.

With this macro you can use tsql_utils.surrogate_key(["col"], col_type="nvarchar(4000)) to solve the problem

2. Generate binary hash

The current tsql_utils adapters cause dbt_utils.surrogate_key() to generate a hash that is stored as a varchar string that uses 32 bytes of data.

This macro allows you to use tsql_utils.surrogate_key(["col"], use_binary_hash=True).

This will keep the key as varbinary that only uses 16 bytes of data. This will reduce space in the database and can potentially increase join performance, but the column has to be converted into varchar before it can be used in Power BI for relationships.

To help with that issue, this PR provides a second macro cast_hash_to_str() that allows you to convert the varbinary surrogate keys to varchar inside your report views before importing them into Power BI to allow relationships on your surrogate key columns.

3. Adjust default values through dbt_project vars:

You can also customize both settings through variables in your dbt_project.yml:

vars:
  dbt_utils_dispatch_list: ['tsql_utils']
  tsql_utils_surrogate_key_col_type: 'nvarchar(1234)'
  tsql_utils_surrogate_key_use_binary_hash: True

Let me know what you think of this and if you would like me to make any adjustments.

dataders commented 3 years ago

@infused-kim can you add a 0.6.6 section to the CHANGELOG.md?