Datavault-UK / automate-dv

A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
https://www.automate-dv.com
Apache License 2.0
478 stars 114 forks source link

Added incremental predicates to Snowflakes tables #198

Open luislema79 opened 1 year ago

luislema79 commented 1 year ago

Added incremental predicates to Snowflake satellites and hubs to limit the scan on the destination table (satellite/hub). When adding new records to a satellite the latest_records cte pulls data from your satellite without any filtering so it doesn't limit the amount of data it pulls and it scans all the partitions in the satellite. The same happens in the records_to_insert cte for a hub. For a big satellite/hub with billions of records, this process takes several minutes. If you don't scan the whole table you run the risk to insert duplicates. However, if you know you might only get duplicates within 3 months you can limit the scan on the table to the last 3 months so you don't scan years worth of data, that way you speed up the incremental process dramatically.