Datavault-UK / automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
https://www.automate-dv.com
Apache License 2.0
478 stars 114 forks source link

Table alias missing for columns specified under src_nk in Hub code #212

Closed ssuri39 closed 9 months ago

ssuri39 commented 9 months ago

Describe the bug If there are mutiple natural keys to be specified in hub under src_nk, the hub code only adds the table alias for first column specified in the list. Example : src_nk = "nk1,nk2". After the code is compiled, the code looks something like below : _WITH row_rank_1 AS ( SELECT ........ rr.nk1,nk2,...... ROW_NUMBER() OVER( PARTITION BY ..... ) AS row_number from tabletest.....

As seen nk2 is not qualified with table alias (rr) causing issue when the model is rerun. For the fisrt time, model runs fine but after rerun we get error as ambiguous column as the code tries to implement the incremental logic to push only unique records

Environment

dbt version: 1.3 automate_dv version: ~1.1.4~ 0.10.1 Database/Platform: Snowflake

To Reproduce Steps to reproduce the behavior:

  1. Create a hub with incremental config
  2. Give multiple natural keys against src_nk separated by comma
  3. Run the hub model, works ok
  4. Rerun the model, gets error ambiguous column

Expected behavior all columns specified in the src_nk should get qualified by table alias

DVAlexHiggs commented 9 months ago

Hi! Thanks for your report.

I'm surprised, as we have tests around this.

Can you confirm the AutomateDV package version please? 1.1.4 isn't an available version.

ssuri39 commented 9 months ago

yes that version was incorrect, here is the correct version : version: 0.10.1

Also , just to mention issue is mainly in _records_toinsert cte, it looks something like this where nk2 should be a.nk2

records_to_insert AS ( SELECT .....a.nk1, nk2,....... FROM row_rank_1 AS a )

DVAlexHiggs commented 9 months ago

Thanks! I've looked at the code and our tests and there's nothing which indicates a problem; I can't reproduce this bug right now. Have you modified/extended/created your own version of the hub macro which introduces this bug?

If not, please provide your model file which is currently presenting the issues (with redactions/sample values if necessary) and I will check it's not a configuration issue on your end before I dig further into our code

ssuri39 commented 9 months ago

dbt_error_automate-dv.txt Thanks for your input. I have not made any changes to macro. I only installed the package as mentioned in the documentation and started developing. I have attached the hub model as well as the dbt error logs for your reference. please note columns and table names are sample names. Please let me know if any other information is needed. thanks for the help

sample_hub_automatedv.txt

DVAlexHiggs commented 9 months ago

Ah I see your problem :) You've provided the src_nk as a CSV string, so it is trying to find a column named NK1, NK2 and prefixing that to form a.NK1, NK2, you need to provide them as a list:

{{ config(materialized='incremental') }}

{%- set source_model = "hash_stg_sample"   -%}
{%- set src_pk = "HK_COL1"          -%}
{%- set src_nk = ["NK1", "NK2"]         -%}
{%- set src_ldts = "COL4"      -%}
{%- set src_source = "COL5"    -%}

{{ automate_dv.hub(src_pk=src_pk, src_nk=src_nk, src_ldts=src_ldts,
                   src_source=src_source, source_model=source_model) }}

This is documented in our Metadata reference, as well as elsewhere, please see here for Hub examples: Hub Metadata Examples

Click on "Per-Model - Variables" then "Composite NK" for the example that best matches your approach:

Capture10

Hope this helps!

ssuri39 commented 9 months ago

yes it worked :) Thank you for detailed information and all the help !!

DVAlexHiggs commented 9 months ago

No worries, happy to help! Welcome to the community 😄