Datavault-UK / automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
https://www.automate-dv.com
Apache License 2.0
478 stars 114 forks source link

Mutli Version Data Load on the Same Record Load Date #206

Closed yedu1985 closed 10 months ago

yedu1985 commented 10 months ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

What is a Multi-Version Load? The typical load patterns in Data Vault assume periodic incremental loads. In every load job, the current state of the source data is compared with the newest version in the target satellite. If there are changes, a new version is loaded into the satellite table.

But there are situations where we need a different loading strategy. Example are:

Initial load of a new satellite with historical data from a historized source system, a Data Lake or a Persistent Staging Area (PSA) Initial load of a complete Data Vault schema with historical data from a historized source system, a Data Lake or a Persistent Staging Area (PSA) Reload of derived satellites in a Business Data Vault, e.g. when business rules were changed

Incremental loads of multiple versions in one batch, e.g. in combination with CDC (change data capture) In all these cases, it is possible that multiple changes for the same business key must be loaded in one step. To explain the difference, let’s have a look at a very simple example: Once a day , the current beer inventory of a microbrewery is written to a satellite table. For an Instance the data is loaded from a PSA table into the satellite table in the Data Vault schema on a specific day having multiple versions. How to load this data into Satellite in any automated way.

Sample Data

PartID Units Source_record_date Source_System 1 10 8/1/2023 Source A 1 11 8/1/2023 Source A 1 12 8/1/2023 Source A 1 13 8/1/2023 Source A 1 14 8/1/2023 Source A

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

DVAlexHiggs commented 10 months ago

Hi! Thanks for taking the time to fill this out. Can you expand on what specific features you're looking for and how you image they should work?

If I understand correctly, what you're looking for is already supported by AutomateDV, dbt itself, or the Data Vault 2.0 standards that we follow and should all be possible within the existing functionality and framework.

DVAlexHiggs commented 10 months ago

Closing as we believe this is now possible in v0.10.0 onwards. Please reopen if you believe this is not the case. Thanks!