cagov / caldata-mdsa-caltrans-pems

CalData's MDSA project with Caltrans on Performance Measurement System (PeMS) data
https://cagov.github.io/caldata-mdsa-caltrans-pems/
MIT License
7 stars 0 forks source link

Investigate "microbatch" incremental strategy #477

Open ian-r-rose opened 1 week ago

ian-r-rose commented 1 week ago

dbt 1.9 introduces a new incremental strategy called "microbatching". It seems like it might be a great fit for our current large incremental time series. It allows some of the following behaviors:

  1. Targeted backfills. Previously we had two options: run the most recent data, or do a full table refresh. This allows us to do more specific backfills, like "refresh the last week of data"
  2. Less manual configuration of lookbacks. We might be able to remove or significantly simplify make_model_incremental().
  3. Breaking large table builds into multiple steps. This probably doesn't matter a huge amount for Snowflake (though I would be interested to be proven wrong!), but with some databases it might help with resource usage.

I think an experimental branch testing this new strategy out would be a great idea. Since it relies on dbt-core 1.9, it would probably need to wait on the version upgrade @summer-mothwood is doing in #352

jkarpen commented 3 days ago

Next step is to have a team discussion on this topic.