Closed DVAlexHiggs closed 1 year ago
Hi All! Just an update.
After some investigation this is not a bug as such. It comes from the fact the user was loading the exact same stage data twice.
This fixes we put in place allow for this within a batch but if you load a batch twice, this may still occur.
We will be putting some checks and changes in place to avoid loading the data again should this occur, however we recommend loading and staging Data according to the Data Vault standards; try to load different data in each load, using delta feeds ideally.
We'll keep this issue updated! Thanks
Hi everyone! Just a quick update. We have a fix and will be releasing this early next week after some further testing.
We will be releasing the fix as a config option which users will need to enable, rather than it being default behaviour. This is because it may have an impact on performance, and a future addition (water-levels) to AutomateDV will make this feature unnecessary in most use cases.
Fixed in v0.10.1
Describe the bug When loading true duplicates into the v0.10.0 Satellites, duplicate data is not ignored
Environment
dbt version: v1.4.7 automate_dv version: v0.10.0 Database/Platform: All
To Reproduce
The above scenario ends up loading 4 records instead of the expected 2.
Expected behavior Only 2 records are loaded instead of 4. See the scenario above.
Additional context This was raised by a user on Slack and we are reporting this here to make the community aware.
We are actively working on a hotfix for release as soon as possible. Thank you to the community for reporting this!
How was this missed in your testing?
We have about 100 tests just for Satellites, around 20 of which were added for the new load approach.
Previously existing tests were passing and proved we had not broken the old behaviour for Satellites.
Whilst we had many test for idempotency, this exact situation was not being tested for.