Datavault-UK / automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
https://www.automate-dv.com
Apache License 2.0
511 stars 131 forks source link

[BUG] databricks hash function quote column with table's alias #159

Closed NikkaIW closed 1 year ago

NikkaIW commented 2 years ago

Describe the bug We are using our own branch for dbtvault, waiting for databricks to be implemented. We are now testing databricks. We are using dbtvault.hash directly into the code for business layer, references tables and so on. It seems that with the 0.9.0 version some quotes are added causing error "Column 'dss_corp.alt_code' does not exist"

Environment

dbt version: 1.2.0 dbtvault version: 0.9.0 Database/Platform: databricks

To Reproduce Use the hash function with a column that include the table's alias

This is the function we were using

{%- macro spark__hash(columns, alias, is_hashdiff) -%}

{%- set hash = var('hash', 'MD5') -%}
{%- set concat_string = var('concat_string', '||') -%}
{%- set null_placeholder_string = var('null_placeholder_string', '^^') -%}

{#- Select hashing algorithm -#}
{%- if hash == 'MD5' -%}
    {%- set hash_alg = 'MD5' -%}
    {%- set hash_size = '' -%}
{%- elif hash == 'SHA' -%}
    {%- set hash_alg = 'SHA2' -%}
    {%- set hash_size = ',256' -%}
{%- else -%}
    {%- set hash_alg = 'MD5' -%}
    {%- set hash_size = '' -%}
{%- endif -%}

{%- set standardise = "NULLIF(UPPER(TRIM(CAST([EXPRESSION] AS STRING))), '')" %}

{#- Alpha sort columns before hashing if a hashdiff -#}
{%- if is_hashdiff and dbtvault.is_list(columns) -%}
    {%- set columns = columns|sort -%}
{%- endif -%}

{#- If single column to hash -#}
{%- if columns is string -%}
    {%- set column_str = dbtvault.as_constant(columns) -%}
    {{- "CAST({}({} {}) AS BINARY) AS {}".format(hash_alg, standardise | replace('[EXPRESSION]', column_str), hash_size, alias) | indent(4) -}}
{#- Else a list of columns to hash -#}
{%- else -%}
    {%- set all_null = [] -%}

    {%- if is_hashdiff -%}
        {{- "CAST({}(CONCAT_WS('{}',".format(hash_alg, concat_string) | indent(4) -}}
    {%- else -%}
        {{- "CAST({}(NULLIF(CONCAT_WS('{}',".format(hash_alg, concat_string) | indent(4) -}}
    {%- endif -%}

    {%- for column in columns -%}

        {%- do all_null.append(null_placeholder_string) -%}

        {%- set column_str = dbtvault.as_constant(column) -%}
        {{- "\nIFNULL({}, '{}')".format(standardise | replace('[EXPRESSION]', column_str), null_placeholder_string) | indent(4) -}}
        {{- "," if not loop.last -}}

        {%- if loop.last -%}
            {% if is_hashdiff %}
                {{- "\n) {}) AS BINARY) AS {}".format(hash_size, alias) -}}
            {%- else -%}
                {{- "\n), '{}'){}) AS BINARY) AS {}".format(all_null | join(""), hash_size, alias) -}}
            {%- endif -%}
        {%- else -%}

            {%- do all_null.append(concat_string) -%}

        {%- endif -%}

    {%- endfor -%}

{%- endif -%}

{%- endmacro -%}

and this is the result

    SELECT
        CAST(SHA2(NULLIF(UPPER(TRIM(CAST(dss_corp.alt_code AS STRING))), '') ,256) AS BINARY) AS condition_hash_key,

with dbtvault 0.9.0 databricks__hash function this is the result

SELECT
        UPPER(SHA2(NULLIF(UPPER(TRIM(CAST(`dss_corp.alt_code` AS STRING))), ''), 256)) AS `condition_hash_key`,

Expected behavior None

Screenshots None

Log files None

Additional context None

DVAlexHiggs commented 1 year ago

Hi, can you let me know if this issue persists in 0.9.1? If it's fixed, we can close this :)

NikkaZ commented 1 year ago

Hi, didn't know you worked on this one. I am off until the 6th of January. Can test it once I am back

NikkaZ commented 1 year ago

This is solved now, thanks!

DVAlexHiggs commented 1 year ago

This is solved now, thanks!

Great to hear! Thanks for checking