cagov / caldata-mdsa-caltrans-pems

CalData's MDSA project with Caltrans on Performance Measurement System (PeMS) data
https://cagov.github.io/caldata-mdsa-caltrans-pems/
MIT License
7 stars 0 forks source link

Make slim CI work better #352

Open ian-r-rose opened 3 months ago

ian-r-rose commented 3 months ago

We currently use "Slim CI" for this project, which (in theory), only runs models that are affected by a PR in CI. The intention here is that we can save time and money by just running the stuff that needs to be run. It works by comparing the code in the branch with that in the nightly production job, and identifying ones where there is a diff.

This is not working as intended because of the way we do some different logic depending upon whether we are "dev" or "prd". In particular, we select less data in dev for a lot of our larger models to improve the developer experience for those models. Unfortunately, this means that there is a real difference between dev and prod, even if the pre-rendered model has not changed, thus creating some false positives in our slim CI job.

I'm not sure of the best way to fix this right now.

jkarpen commented 3 months ago

Assigning to @kengodleskidot to review and see if he has any thoughts on how to move forward on this one.

jkarpen commented 2 months ago

We put this in the backlog until @ian-r-rose returns to get his input on how to handle this item.

jkarpen commented 1 month ago

Summer will tackle this research project, will assign once she is added to the right GitHub team.

summer-mothwood commented 2 weeks ago

Note: the dbt documentation page on state modified caveats have been updated a few times over the last few weeks, with some changes that might specifically address our issue -- but would require a version upgrade from dbt 1.7.18 (current Caltrans dbt version) to dbt 1.9+ or dbt Versionless.

Beginning in dbt Core 1.9, when you set the state_modified_compare_vars behavior flag to True and a model uses a var or env_var in its definition, dbt will identify that lineage in such a way that it will include the model in state:modified when the var or env_var value has changed.

You need to build the state directory using dbt v1.9 or higher, or Versionless dbt Cloud, and you need to set state_modified_compare_more_unrendered_values to true within your dbt_project.yml. If the state directory was built with an older dbt version or if the state_modified_compare_more_unrendered_values behavior change flag was either not set or set to false, you need to rebuild the state directory to avoid false positives during state comparison with state:modified.

I'm still looking into this, but wanted to document these updates for now.

ian-r-rose commented 2 weeks ago

Ooh, these new flags seem like they might be helpful! I think we are feeling enough pain in CI that an upgrade to 1.9 would be worth it.

summer-mothwood commented 1 week ago

Ian and I talked about this today and agreed on the following:

Once we're ready to upgrade our dbt version, the general steps to do so is:

Once in Versionless, we can then start working on setting the state_modified_compare_more_unrendered_values flag to true (will need to rebuild the state first, as mentioned in the documentation). More reading here: https://docs.getdbt.com/reference/global-configs/behavior-changes#source-definitions-for-state

summer-mothwood commented 1 week ago

I created a new issue in the DSE repo for upgrading/testing dbt Versionless https://github.com/cagov/data-infrastructure/issues/423. Will do the work there before coming back to this issue to implement the new state modified flag to address slim CI false positives.

jkarpen commented 3 days ago

This is blocked until dbt versionless change is implemented.