Closed dlouseiro closed 1 year ago
Very cool! Reviewing the logic and it looks solid to me. I'll do some data tests now.
Suggestion to change the title to something like "Use load timestamp to improve performance of MERGE
query". I think this strategy should improve queries even without cluster keys, right?
Suggestion to change the title to something like "Use load timestamp to improve performance of
MERGE
query". I think this strategy should improve queries even without cluster keys, right?
Yeah indeed
Done @michael-the1
I tested with a few different hubs/links/satellites and it looks good to me!
The purpose of this PR is to ensure the loading statements of data vault tables scan the least amount of records possible by:
MERGE
ON
clause, ensuring the number of records scanned is the minimum possible.This strategy is specially efficient for tables that are clustered by
r_timestamp
(or even better byr_timestamp :: DATE
).