PicnicSupermarket / diepvries

The Picnic Data Vault framework.
https://diepvries.picnic.tech
MIT License
126 stars 15 forks source link

Use load timestamp to improve performance of MERGE query #45

Closed dlouseiro closed 1 year ago

dlouseiro commented 1 year ago

The purpose of this PR is to ensure the loading statements of data vault tables scan the least amount of records possible by:

This strategy is specially efficient for tables that are clustered by r_timestamp (or even better by r_timestamp :: DATE).

michael-the1 commented 1 year ago

Very cool! Reviewing the logic and it looks solid to me. I'll do some data tests now.

michael-the1 commented 1 year ago

Suggestion to change the title to something like "Use load timestamp to improve performance of MERGE query". I think this strategy should improve queries even without cluster keys, right?

dlouseiro commented 1 year ago

Suggestion to change the title to something like "Use load timestamp to improve performance of MERGE query". I think this strategy should improve queries even without cluster keys, right?

Yeah indeed

dlouseiro commented 1 year ago

Done @michael-the1

michael-the1 commented 1 year ago

I tested with a few different hubs/links/satellites and it looks good to me!