Datavault-UK / automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
https://www.automate-dv.com
Apache License 2.0
510 stars 131 forks source link

[BUG] PIT table loads failing with dbtvault 0.9.0 - Syntax error in the SQL generated #157

Closed sgade-ansys closed 1 year ago

sgade-ansys commented 2 years ago

Describe the bug My PIT tables were working fine with 0.8.2 , I had to upgrade to 0.9.0 to fix the issues with Bridge load. But now, all my PITS started failing due to a syntax issue within the SQL generated. The cast function for Ghost records is missing datatype. Seems like inside the macro,datatype value is not correctly being extracted. The same issue has occurred for all my PIT loads.

Environment DBT CLoud - Non Prod (Have a release date coming up for Prod deployment)

dbt version: 1.0.0 dbtvault version: 0.9.0 Database/Platform: Snowflake

To Reproduce Steps to reproduce the behavior: create project pit_xx.sql file Set up a connection config for Snowflake

pit config sample below -

{%- set yaml_metadata -%}
source_model: hub_account
src_pk: HUB_ACCOUNT_HKEY
as_of_dates_table: current_as_of_dates
satellites: 
    sat_ebs_account:
      pk:
          'PK': 'HUB_ACCOUNT_HKEY'
      ldts:
          'LDTS': 'SAT_LDTS'
    sat_sfdc_account:
      pk:
          'PK': 'HUB_ACCOUNT_HKEY'
      ldts:
          'LDTS': 'SAT_LDTS'
    sat_siebel_account:
      pk:
          'PK': 'HUB_ACCOUNT_HKEY'
      ldts:
          'LDTS': 'SAT_LDTS'
stage_tables_ldts:
    'v_stg_ebs_account': 'SAT_LDTS'
    'v_stg_sfdc_account': 'SAT_LDTS'
    'v_stg_siebel_account': 'SAT_LDTS'
src_ldts: 'HUB_LDTS'
{%- endset -%}

{% set metadata_dict = fromyaml(yaml_metadata) %} 

{{ dbtvault.pit(source_model=metadata_dict["source_model"],
                src_pk=metadata_dict["src_pk"],
                as_of_dates_table=metadata_dict["as_of_dates_table"],
                satellites=metadata_dict["satellites"],
                stage_tables_ldts=metadata_dict["stage_tables_ldts"],
                src_ldts=metadata_dict["src_ldts"]) }}

Expected behavior Populate the target table with most current Account data with as of date

Screenshots image image

Log files If applicable, provide dbt log files which include the problem. pit_load_log.log

Additional context Add any other context about the problem here.

DVTimWilson commented 2 years ago

Hi, thank you for getting in touch about this problem, please can you check that the hash variable in your dbt_project.yml file is assigned the value either MD5 or SHA? For example:

vars:
  hash: MD5
sgade-ansys commented 2 years ago

Hi, Below is what I have.

sgade-ansys commented 2 years ago

vars: hash: SHA1

DVAlexHiggs commented 2 years ago

SHA1 is not a supported algorithm, please read our documentation here.

@DVTimWilson Perhaps we need some error handling here to make this more obvious?

DVTimWilson commented 2 years ago

Hi, the only possible options for that value currently are either MD5 or SHA. So please try changing your value to SHA (this equates to SHA256 in the code).

@DVAlexHiggs Agreed, will add to DevOps.

sgade-ansys commented 2 years ago

Thank you for your quick response. I can change it to SHA but will that impact existing data load?

DVAlexHiggs commented 2 years ago

You will need to re-generate everything by doing a --full-refresh

DVTimWilson commented 2 years ago

Hi, apologies, I had not thought this through completely. So with your original variable value of SHA1 all hashing would have defaulted to MD5 per the code within the hash macro:

...
{%- if hash == 'MD5' -%}
    {%- set hash_alg = 'MD5' -%}
    {%- set hash_size = 16 -%}
{%- elif hash == 'SHA' -%}
    {%- set hash_alg = 'SHA2_256' -%}
    {%- set hash_size = 32 -%}
{%- else -%}
    {%- set hash_alg = 'MD5' -%}
    {%- set hash_size = 16 -%}
{%- endif -%}
...

So to remain consistent with that and to avoid having to reload any data the hash variable in your dbt_project.yml file should in fact be MD5, as follows:

vars:
  hash: MD5

@DVAlexHiggs So the datatype selection algorithm should at least work consistently with this.

sgade-ansys commented 2 years ago

Thank you, Tim and Alex. In that case, I will continue to use MD5. Hope this should resolve my PIT load issue with the new dbt version 0.9.0. I will keep you posted.

From: Tim Wilson @.> Sent: Thursday, September 15, 2022 9:36 AM To: Datavault-UK/dbtvault @.> Cc: Sruthi Gade @.>; Author @.> Subject: Re: [Datavault-UK/dbtvault] [BUG] PIT table loads failing with dbtvault 0.9.0 - Syntax error in the SQL generated (Issue #157)

[External Sender]

Hi, apologies, I had not thought this through. So with your original variable value of SHA1 all hashing would have defaulted to MD5 per the code within the hash macro:

...

{%- if hash == 'MD5' -%}

{%- set hash_alg = 'MD5' -%}

{%- set hash_size = 16 -%}

{%- elif hash == 'SHA' -%}

{%- set hash_alg = 'SHA2_256' -%}

{%- set hash_size = 32 -%}

{%- else -%}

{%- set hash_alg = 'MD5' -%}

{%- set hash_size = 16 -%}

{%- endif -%}

...

So to remain consistent with that and to avoid having to reload any data the hash variable in your dbt_project.yml file should in fact be MD5, so:

vars:

hash: MD5

- Reply to this email directly, view it on GitHubhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDatavault-UK%2Fdbtvault%2Fissues%2F157%23issuecomment-1248112270&data=05%7C01%7Csruthi.gade%40ansys.com%7C5894f0b3da46443f0fa808da971f4224%7C34c6ce6715b84eff80e952da8be89706%7C0%7C0%7C637988457756093929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2F3yt%2BO%2F9VlxLIBVWoIZhHMzSZtZLHZOKNEgKcSUCfyA%3D&reserved=0, or unsubscribehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FATJYHAG6OBT466A7YB5VADLV6MQ4ZANCNFSM6AAAAAAQMUFWWI&data=05%7C01%7Csruthi.gade%40ansys.com%7C5894f0b3da46443f0fa808da971f4224%7C34c6ce6715b84eff80e952da8be89706%7C0%7C0%7C637988457756093929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=r8D5xLo3YP5EOQOnoVbzHIGG5z2d4dpJoTi7S6cLlG0%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.**@.>>

sgade-ansys commented 2 years ago

Hi Alex and Tim,

I'm still having issues with PIT load. This time it's different. All my .sql file for hubs and sats are in lower case in dbt. With the pit macro, the temp table tha gets created have column names in lower case in double quotes and that doesn't match with the target table column name. Please see the log below.

pit_log.log image image

DVTimWilson commented 2 years ago

Hi, yes, this looks like a regression, we are investigating and will have an update on progress next week. Apologies meanwhile for the inconvenience that this is causing.

DVTimWilson commented 2 years ago

Hi, as a possible workaround, please can you edit the metadata section of your pit model so that the satellite names (circled in red below) are in uppercase text? And confirm whether or not this resolves the issue for the time being. Thank you.

image

sgade-ansys commented 2 years ago

Thank you for getting back on this. I tried doing that last week. Unfortunately, it didn’t work because my sat .sql script file are in lower case. If I change it to upper case like you showed above, it complained as it couldn’t reference the sat objects. I may have to go back and edit my SAT script file name to upper case and all it's references.

DVTimWilson commented 2 years ago

OK, I see. So I think the next thing to try is to replace your pit macro file, pit.sql, with this one please. This is the latest pit macro code with some tweaks that I am hoping will reverse the regression bug.

TimNapierVST commented 2 years ago

Personally it would be preferable not to change the case of any of the identifiers in the code. If someone wants to use camel case or upper case they can enclose it in quotes, otherwise let snowflake or whatever do it

DVAlexHiggs commented 2 years ago

Personally it would be preferable not to change the case of any of the identifiers in the code. If someone wants to use camel case or upper case they can enclose it in quotes, otherwise let snowflake or whatever do it

Yes this would be the ideal. However, we've made this design decision to solve the issue of reserved words and other problematic column names. This particular bug was caused by accidental removal of some of the logic which handled this.

sgade-ansys commented 2 years ago

Hi Alex and Tim,

Is this issue deployed to the dbtvault package? If so, which version should I be using?

sgade-ansys commented 2 years ago

Hi Tim and Alex,

I tested with 0.9.0 and it seems to be the same issue. I was using the scripthttps://github.com/Datavault-UK/dbtvault/blob/feat/fix-157/dbtvault-dev/macros/tables/snowflake/pit.sql that Tim provided for my non prod instance. Could you please let me know when this fix would be in production? If so, which version.

Thank You, Sruthi Gade

From: Alex Higgs @.> Sent: Monday, October 24, 2022 3:16 AM To: Datavault-UK/dbtvault @.> Cc: Sruthi Gade @.>; Author @.> Subject: Re: [Datavault-UK/dbtvault] [BUG] PIT table loads failing with dbtvault 0.9.0 - Syntax error in the SQL generated (Issue #157)

[External Sender]

Personally it would be preferable not to change the case of any of the identifiers in the code. If someone wants to use camel case or upper case they can enclose it in quotes, otherwise let snowflake or whatever do it

Yes this would be the ideal. However, we've made this design decision to solve the issue of reserved words and other problem column names. This particular big was

- Reply to this email directly, view it on GitHubhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDatavault-UK%2Fdbtvault%2Fissues%2F157%23issuecomment-1288526920&data=05%7C01%7Csruthi.gade%40ansys.com%7Ca6847a15dd62424488dd08dab58fa17b%7C34c6ce6715b84eff80e952da8be89706%7C0%7C0%7C638021925750918850%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xwz%2F0Hke1vX%2B8x6xn%2BIFG0gezJfvmqkMpf1FZVyJxvg%3D&reserved=0, or unsubscribehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FATJYHAFC35BERPH76BUWVBTWEYZTXANCNFSM6AAAAAAQMUFWWI&data=05%7C01%7Csruthi.gade%40ansys.com%7Ca6847a15dd62424488dd08dab58fa17b%7C34c6ce6715b84eff80e952da8be89706%7C0%7C0%7C638021925750918850%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ow%2ByBZaLar80UGb8JuWZYuW3gSweOpmPfT8HeDff%2Big%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.**@.>>

DVAlexHiggs commented 2 years ago

Hi! We appreciate you are eager for this fix. Unfortunately it is a bigger issue than this bug alone, and requires a bit of a re-design of how column escaping works in dbtvault for us to fix this permanently (this issue keeps coming back!)

Thank you for your patience whilst we develop a solution

sgade-ansys commented 2 years ago

Thanks Alex for your quick response.

Thank You, Sruthi Gade

From: Alex Higgs @.> Sent: Thursday, November 17, 2022 1:00 PM To: Datavault-UK/dbtvault @.> Cc: Sruthi Gade @.>; Author @.> Subject: Re: [Datavault-UK/dbtvault] [BUG] PIT table loads failing with dbtvault 0.9.0 - Syntax error in the SQL generated (Issue #157)

[External Sender]

Hi! We appreciate you are eager for this fix. Unfortunately it is a bigger issue than this bug alone, and requires a bit of a re-design of how column escaping works in dbtvault for us to fix this permanently (this issue keeps coming back!)

Thank you for your patience whilst we develop a solution

- Reply to this email directly, view it on GitHubhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDatavault-UK%2Fdbtvault%2Fissues%2F157%23issuecomment-1319006581&data=05%7C01%7Csruthi.gade%40ansys.com%7C45bb4cf34e424b6c22a008dac8c59707%7C34c6ce6715b84eff80e952da8be89706%7C0%7C0%7C638043048270629751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Uioxuba1VN%2FnVtiaSBkXiAnOgaF3Xv75e0ZPrQR4hDA%3D&reserved=0, or unsubscribehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FATJYHAGOOH7N5C4WALLJMA3WIZXDHANCNFSM6AAAAAAQMUFWWI&data=05%7C01%7Csruthi.gade%40ansys.com%7C45bb4cf34e424b6c22a008dac8c59707%7C34c6ce6715b84eff80e952da8be89706%7C0%7C0%7C638043048270629751%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SkG%2FjUTbUjxgyAIIcxAnjrAqs8ytY1adEfvZ0TQ4Qko%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.**@.>>

DVAlexHiggs commented 1 year ago

0.9.1 Released which addresses this issue. Please close if this issue is resolved, or let us know if not!

DVARymer commented 1 year ago

Datavault have double checked this in 0.9.1 and looks resolved. Thank you!

image

DVAlexHiggs commented 1 year ago

Closing as this should now be fixed in 0.9.1. Please re-open if not! Thanks for reporting 😄

sgade-ansys commented 1 year ago

Hi Alex,

Thank you so much for keeping me posted on this. I have tested this today and get the following error. The query is timing out on one of the left outer join. Please let me know if I’m missing anything. DBT version – 1.3 (latest) Dbt vault – 0.9.1

Dbt model - @.*** LEFT OUTER JOIN ​(​​(​A​.​HUB_ASSET_HKEY = SAT_SIEBEL_ASSET_SRC​.​HUB_ASSET_HKEY​)​ AND ​(​​(​TO_TIMESTAMP_LTZ​(​SAT_SIEBEL_ASSET_SRC​.​SAT_LDTS​)​​)​ <= '2023-01-06 16:39:23​.​092000000Z'​)​​)​ OR ​(​SAT_SIEBEL_ASSET_SRC​.​SAT_LDTS = '1900-01-01 00:00:00​.​000000000Z'​)​

@.***

Thank You!

Sruthi Gade Senior Application Developer ANSYS, Inc. @.**@.> Mobile: 1.510.456.8867 www.ansys.comhttp://www.ansys.com/ @.***

From: Alex Higgs @.> Sent: Thursday, December 22, 2022 5:57 PM To: Datavault-UK/dbtvault @.> Cc: Sruthi Gade @.>; Author @.> Subject: Re: [Datavault-UK/dbtvault] [BUG] PIT table loads failing with dbtvault 0.9.0 - Syntax error in the SQL generated (Issue #157)

[External Sender]

Closing as this should now be fixed in 0.9.1. Please re-open if not! Thanks for reporting :)

— Reply to this email directly, view it on GitHubhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FDatavault-UK%2Fdbtvault%2Fissues%2F157%23issuecomment-1363418936&data=05%7C01%7Csruthi.gade%40ansys.com%7C71a6f043963745f1de1908dae46fe348%7C34c6ce6715b84eff80e952da8be89706%7C0%7C0%7C638073466472347474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9HkF9JrBjTvKRZadacfN8Wy7dvGRL25mbHptyiRNfZE%3D&reserved=0, or unsubscribehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FATJYHABZRXJXR7NE6P3NJ4TWOTMFDANCNFSM6AAAAAAQMUFWWI&data=05%7C01%7Csruthi.gade%40ansys.com%7C71a6f043963745f1de1908dae46fe348%7C34c6ce6715b84eff80e952da8be89706%7C0%7C0%7C638073466472347474%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ducEvfq5VCayxnawrDUeVtzFO%2BdAbJSyasJFZIBcmT4%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.**@.>>