Datavault-UK / automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
https://www.automate-dv.com
Apache License 2.0
478 stars 114 forks source link

[BUG] Effectivity Satellites not setting end date, or user error? #237

Closed mjahammel closed 2 weeks ago

mjahammel commented 2 weeks ago

Describe the bug I have built a simple effectivity satellite for a link with 2 hubs, and it does not appear to be loading the way I expect it. I set up test data for 5 rows (each with a different effective date, i.e., one row loaded per run). In the test data, the value for hub_1 was constant, and the value for hub_2 varied. After 5 loads there are 5 rows in the link and 5 rows in the effectivity satellite. The definition of the effectivity satellite is:

{{ config(materialized = 'incremental') }}

{% set yaml_metadata %}
  source_model: v_stg_hub_links
  src_pk: dv_hkey_hub_links
  src_dfk: dv_hkey_hub_1
  src_sfk:
    - dv_hkey_hub_2
  src_start_date: dv_start_date
  src_end_date: dv_end_date
  src_eff: dv_appdt
  src_ldts: dv_ldts
  src_source: dv_rsrc
{% endset %}

{% set metadata_dict = fromyaml(yaml_metadata) %}

{{ automate_dv.eff_sat(src_pk = metadata_dict['src_pk'],
                       src_dfk = metadata_dict['src_dfk'],
                       src_sfk = metadata_dict['src_sfk'],
                       src_extra_columns = metadata_dict['src_extra_columns'],
                       src_start_date = metadata_dict['src_start_date'],
                       src_end_date = metadata_dict['src_end_date'],
                       src_eff = metadata_dict['src_eff'],
                       src_ldts = metadata_dict['src_ldts'],
                       src_source = metadata_dict['src_source'],
                       source_model = metadata_dict['source_model']) }}

Each row in the effectivity satellite has an end date of '9999-12-31' (see screenshots), but I was expecting that only the last (most recent based on DV_APPDT) row should have that end date, the other rows should have an end date identifying when the next most recent row superseded it.

Is this a bug ,or did I set up the effectivity satellite wrong, or are my expectations incorrect, or (something else)?

Thanks in advance,

Environment

dbt version: 1.7.16 automate_dv version: 0.11.0 Database/Platform: Snowflake (dbt adpater v1.7.5)

To Reproduce execute 'dbt run --select "+link_hub_links +sate_hub_links"' 5 times, changing the load date value between each run. I can provide a copy of the dbt project, if required

Expected behavior See above in "Describe the bug"

Screenshots The data in the effectivity satellite (sate_hub_links): image

The data in link (link_hub_links): image

Log files N/A

Additional context N/A

AB#5471

DVAlexHiggs commented 2 weeks ago

Hello!

Have you enabled the auto-end-dating option for effectivity satellites? Please find the docs here

mjahammel commented 2 weeks ago

User error, then. :blush:

Yes, I completely missed that option in the documentation. I added that to configuration for the satellite and regenerated it. At first glance, there were more rows added to the satellite than I expected, but when you consider the 'insert-only' data vault 'rule', the data makes sense.

Thanks for the pointer, @DVAlexHiggs. I will close this issue.