Datavault-UK / automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
https://www.automate-dv.com
Apache License 2.0
478 stars 114 forks source link

[BUG] Columns with parenthesis are not escaped in hashdiff #141

Closed guaripete-github closed 1 year ago

guaripete-github commented 1 year ago

Describe the bug Column names with parenthesis included in hashdiff are not properly escaped in the generated code.

Environment

dbt version: 1.1.1 dbtvault version: 0.8.3 Database/Platform: SQL Server

To Reproduce Steps to reproduce the behavior: Use a column name with parenthesis in hashdiff. The compiled code:

ISNULL(NULLIF(UPPER(TRIM(CAST(OPERATOR (REPORTED) AS VARCHAR(max)))), ''), '^^'),

Expected behavior The compiled code should be: ISNULL(NULLIF(UPPER(TRIM(CAST("OPERATOR (REPORTED)" AS VARCHAR(max)))), ''), '^^'),

Additional context The issue seems to be in this logic in dbtvault/macros/supporting/hash.sql

    {%- for column in columns -%}
        {%- do all_null.append(null_placeholder_string) -%}
        {%- set column_str = dbtvault.as_constant(column) -%}
        {%- if dbtvault.is_expression(column_str) -%}
            {%- set escaped_column_str = column_str -%}
        {%- else -%}
            {%- set escaped_column_str = dbtvault.escape_column_names(column_str) -%}
        {%- endif -%}
        {{- "\nISNULL({}, '{}')".format(standardise | replace('[EXPRESSION]', escaped_column_str), null_placeholder_string) | indent(4) -}}
        {{- "," if not loop.last -}}

dbtvault/macros/intenal/helpers/is_checks.sql

{%- macro is_expression(obj) -%}
    {%- if obj is string -%}
        {%- if (obj | first == "'" and obj | last == "'") or ("(" in obj and ")" in obj) or "::" in obj -%}
            {%- do return(true) -%}
        {%- else -%}
            {%- do return(false) -%}
        {%- endif -%}
    {%- else -%}
        {%- do return(false) -%}
    {%- endif -%}
{%- endmacro -%}

where a column with parenthesis in column name is considered an expression and uses the returned value of dbtvaul/macros/internal/metadata_processing/as_constant.sql

{%- macro default__as_constant(column_str) -%}
    {%- if column_str is not none and column_str is string and column_str -%}
        {%- if column_str | first == "!" -%} 
            {{- return("'" ~ column_str[1:] ~ "'") -}}        
        {%- else -%}        
            {%- if dbtvault.is_expression(column_str) -%}
                {{- return(column_str) -}}
            {%- else -%}
                {{- return(dbtvault.escape_column_names(column_str)) -}}
            {%- endif -%}
        {%- endif -%}

which again evaluates the column name as an expression, therefore not escaping it.

Bypassing the logic:

    {%- for column in columns -%}
        {%- do all_null.append(null_placeholder_string) -%}
        -- {%- set column_str = dbtvault.as_constant(column) -%}
        -- {%- if dbtvault.is_expression(column_str) -%}
        --     {%- set escaped_column_str = column_str -%}
        -- {%- else -%}
        --     {%- set escaped_column_str = dbtvault.escape_column_names(column_str) -%}
        -- {%- endif -%}
        {%- set escaped_column_str = dbtvault.escape_column_names(column) -%}
        {{- "\nISNULL({}, '{}')".format(standardise | replace('[EXPRESSION]', escaped_column_str), null_placeholder_string) | indent(4) -}}
        {{- "," if not loop.last -}}

renders the expected code:

ISNULL(NULLIF(UPPER(TRIM(CAST("OPERATOR (REPORTED)" AS VARCHAR(max)))), ''), '^^'),

but not sure what could be the consequences of such.

DVTimWilson commented 1 year ago

Hi, thank you for reporting this issue.

This has also been reported as issue 114. Please see that issue for details of the proposed fix which has now been developed, and which will be included in the next release.

DVAlexHiggs commented 1 year ago

Fix released in v0.9.0. Let us know if you continue to have issues. Thanks!