Datavault-UK / automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
https://www.automate-dv.com
Apache License 2.0
513 stars 131 forks source link

[FEATURE] Arbitrary aliasing in macros #149

Open DVAlexHiggs opened 2 years ago

DVAlexHiggs commented 2 years ago

At the moment you can alias a hashdiff in dbtvault satellite macros. This is so that you may have a standardised name for your hashdiff in every satellite, even if they share the same primed staging area.

This use case is not isolated to hashdiffs however, for example for a Link Primary key we may want to always call them LINK_HK fo simplicity of joins later on, or just so we have uniform links.

Being able to do this for any parameter in any macro would be ideal, and would make it much easier to standardise across our raw vault.

Something like the following would be very useful:

{%- set yaml_metadata -%}
source_model: stg_web_customer_hashed
src_pk: CUSTOMER_HK
src_nk: 
  - source_column: HUB_CUSTOMER_CK
    alias: COLLISION_KEY
  - CUSTOMER_ID
src_ldts: LOAD_DATETIME
src_source: RECORD_SOURCE
{%- endset -%}

{% set metadata_dict = fromyaml(yaml_metadata) %}

{{ dbtvault.hub(src_pk=metadata_dict["src_pk"],
                src_nk=metadata_dict["src_nk"], 
                src_ldts=metadata_dict["src_ldts"],
                src_source=metadata_dict["src_source"],
                source_model=metadata_dict["source_model"]) }}

Resulting in a table like:

CUSTOMER_HK CUSTOMER_ID COLLISION_KEY LOAD_DATETIME RECORD_SOURCE
b8c37e33defde51cf91e1e03e51657da 1001 CUSTOMER_CRM 1993-01-01 00:00:00.000 CUSTOMER_CRM

The future

On the topic of standardised naming, we plan to make this configuration centralised. Some day wuldn't it be cool if you could do the following in dbt_project.yml:

vars:
  hash: MD5
  hashdiff_name: HASHDIFF

This would ensure the hashdiff of every satellite would be named HASHDIFF

AB#5370