Closed khalidhussein138 closed 8 months ago
Hi,
Thanks for this report!
This is something we've noticed internally on our own projects and have recently developed a fix for and hope to release in due course.
We've also decided to change RANK()
for ROW_NUMBER()
everywhere for the reason you describe as well as performance enhancements.
Hi @DVAlexHiggs that's good to know, any idea when you say "due course", what sort of time frame do you think you're looking at? Are we talking a month, 6 months, 2 years?
Hi @DVAlexHiggs that's good to know, any idea when you say "due course", what sort of time frame do you think you're looking at? Are we talking a month, 6 months, 2 years?
Apologies for the vagueness! The target is this month; we've got a few QoL, performance improvements and various other things coming down the line 😄
Amazing, thank you!
Fixed in v0.10.2 😄 Thanks for your patience for release of this! Please let us know if you experience any issues by responding here or opening a new issue.
Describe the bug When loading into a Satellite table where Stage contains two or more duplicate records (the same hash_key, hash_diff and load_timestamp) and the records have not been previously loaded into the satellite, both records are loaded into the satellite rather than just one.
Environment
dbt version: 1.5.2 automate_dv version: 0.10.1 Database/Platform: Snowflake
To Reproduce
Expected behavior
The root cause When debugging the Satellite script, we noticed that in this CTE the rank() statement is ranking all duplicate records as 1 which causes the records to be loaded. As a suggestion, changing this to a row_number() would solve the issue.