cagov / caldata-mdsa-caltrans-pems

CalData's MDSA project with Caltrans on Performance Measurement System (PeMS) data
https://cagov.github.io/caldata-mdsa-caltrans-pems/
MIT License
7 stars 0 forks source link

Investigate slow QA/QC models #459

Open ian-r-rose opened 3 weeks ago

ian-r-rose commented 3 weeks ago

In recent weeks, CI runs have roughly doubled in length (example here). This increase is dominated by a few newer models:

I think it would be worthwhile to look into how we can speed it up. A couple of options:

  1. There may be some performance optimizations to be found based on appropriate partition/column pruning
  2. We may want to make a couple of them incremental
  3. We may wan to choose a larger warehouse for them
jkarpen commented 2 weeks ago

Note to reach out to @summer-mothwood if it seems like the solution will be to convert to incremental models.

jkarpen commented 2 weeks ago

Possibly also explore this option: https://docs.getdbt.com/docs/build/incremental-microbatch

JamesSLogan commented 38 minutes ago

Our nightly build time should improve by 20+ minutes, by my estimation.

Pinging @summer-mothwood since one of the fixes does introduce incrementality to a model. (I used detector_id)