ScalefreeCOM / datavault4dbt

Scalefree's dbt package for a Data Vault 2.0 implementation congruent to the original Data Vault 2.0 definition by Dan Linstedt including the Staging Area, DV2.0 main entities, PITs and Snapshot Tables.
https://www.scalefree.com/
Apache License 2.0
134 stars 26 forks source link

Column named "timestamp" is not escaped #228

Closed maxmue closed 4 weeks ago

maxmue commented 4 weeks ago

I have a source table with a column named timestamp, which is a reserved keyword in Redshift.

It generated the following error: Encountered an error: Runtime Error Database Error in model <model> (models/staging/<model>.sql) syntax error at or near "," in context "event, timestamp,", at line 24, column 18

The compiled code contains an unescaped timestamp: `WITH

source_data AS ( SELECT

    event,
    timestamp,`
tkiehn commented 4 weeks ago

Hi @maxmue and thanks for reaching out!

If you are still using the dev-branch then you should be able to test the fix for this issue after running dbt deps --upgrade.

Let me know if this helped

Kind Regards Theo

maxmue commented 4 weeks ago

Amazing, you seem to be one step ahead. ;-) Thanks!

maxmue commented 4 weeks ago

@tkiehn before I open an new issue i'll post it here:

i also observe this being present in stage macro compile output:

TO_DATE('{'bigquery': '8888-12-31', 'snowflake': '8888-12-31', 'exasol': '8888-12-31', 'postgres': '8888-12-31', 'redshift': '8888-12-31', 'synapse': '8888-12-31'}', '{'bigquery': '%Y-%m-%d', 'snowflake': 'YYYY-MM-DD', 'exasol': 'YYYY-mm-dd', 'postgres': '%Y-%m-%d', 'redshift': 'YYYY-MM-DD', 'synapse': 'yyyy-MM-dd'}' ) as "__date",

tkiehn commented 4 weeks ago

Hi @maxmue,

I am assuming that is inside the ghost-record creation? The fastest fix for that issue would be to set the following global variables datavault4dbt.beginning_of_all_times_date: "0001-01-01", datavault4dbt.end_of_all_times_date: "8888-12-31" and datavault4dbt.date_format: "YYYY-MM-DD" inside your dbt_project.yml under the vars:-key.

maxmue commented 4 weeks ago

@tkiehn thanks so far.

I assume the new escape logic is a bit too agressive: `ldts_rsrc_data AS (

SELECT CAST( "current_timestamp" as TIMESTAMPTZ ) AS ldts,`

current_timestamp should not be escaped

i've set ldts: 'current_timestamp btw

tkiehn commented 4 weeks ago

Hi @maxmue, you're welcome :)

To keep it brief: current_timestamp is being escaped because it's not being recognized as expression by one of our internal macros (is_expression).

You could either wrap it in brackets like so (current_timestamp) or use now() instead. Both functions returned the same values for me on my redshift instance.

As per redshift-documentation both are deprecated leader node-only functions, but the replacement functions return values without UTC-offset. image

maxmue commented 4 weeks ago

Yo @tkiehn,

please tell me if me contacting you this frequently is bothering you. If there is a better way or I should stop, please tell me.

I just discovered something I would call a bug in latest dev branch:

This is the stage model:

{{ config(materialized="view") }}

{%- set yaml_metadata -%}
source_model:
    'test': 'posthog'
ldts: '(current_timestamp)'
rsrc: '!posthog.newleaf'
derived_columns:
    event_type:
        value: "platform||event"
        datatype: varchar
        src_cols_required:
            - platform
            - event
hashed_columns: 
    hk_event_h:
        - event
        - insert_id
{%- endset -%}

{%- set metadata_dict = fromyaml(yaml_metadata) -%}

{%- set source_model = metadata_dict.get("source_model") -%}
{%- set ldts = metadata_dict.get("ldts") -%}
{%- set rsrc = metadata_dict.get("rsrc") -%}
{%- set derived_columns = metadata_dict.get("derived_columns") -%}
{%- set hashed_columns = metadata_dict.get("hashed_columns") -%}

{{
    datavault4dbt.stage(
        source_model=source_model,
        ldts=ldts,
        rsrc=rsrc,
        derived_columns=derived_columns,
        hashed_columns=hashed_columns
    )
}}

Please mind the derived column event_type. If its name contains the string event, it removed the original event from the ghost records. xxx as name works fine, so does even, but blaheventblah does not.

Best regards, Max

P.S. This works fine in the redshift-patch branch.

tkiehn commented 4 weeks ago

Hi @maxmue,

I'm absolutely not bothered so if you have any issues or improvements we are always happy to hear about it!

Thank you for pointing out the bug you found and thank you for providing such a good explanation!

I have identified the issue and will implement a fix in the dev-branch now.

Best Regards Theo