PicnicSupermarket / diepvries

The Picnic Data Vault framework.
https://diepvries.picnic.tech
MIT License
127 stars 15 forks source link

Improve performance of satellite loading queries #44

Closed dlouseiro closed 1 year ago

dlouseiro commented 1 year ago

The purpose of this PR is to improve the performance of the queries that load satellites by reducing the number of records scanned.

In the previous version (before this PR) of the MERGE statements that load satellites, we were doing multiple accesses to the satellite table (in satellite_staging_affected_queries CTE). On top of that, we were also doing a window function to calculate the record end timestamp, when it is possible to achieve the same result without the window function.

In this new version of the code, I'm:

  1. Pre-filter the satellite in a new CTE (filtered_satellite)
  2. Apply an LEFT JOIN with the filtered satellite earlier (directly on the filtered_staging CTE instead of the staging_satellite_affected_records)
  3. Use DATEADD(nanosecond, -1, staging.r_timestamp) as the expression to calculate the r_timestamp_end, instead of the previous expression using a window function (LEAD)
dlouseiro commented 1 year ago

Closing this PR in favour of this one