Open ian-r-rose opened 3 months ago
Assigning to @kengodleskidot to review and see if he has any thoughts on how to move forward on this one.
We put this in the backlog until @ian-r-rose returns to get his input on how to handle this item.
Summer will tackle this research project, will assign once she is added to the right GitHub team.
Note: the dbt documentation page on state modified caveats have been updated a few times over the last few weeks, with some changes that might specifically address our issue -- but would require a version upgrade from dbt 1.7.18 (current Caltrans dbt version) to dbt 1.9+ or dbt Versionless.
Beginning in dbt Core 1.9, when you set the
state_modified_compare_vars
behavior flag toTrue
and a model uses avar
orenv_var
in its definition, dbt will identify that lineage in such a way that it will include the model instate:modified
when thevar
orenv_var
value has changed.
You need to build the state directory using dbt v1.9 or higher, or Versionless dbt Cloud, and you need to set
state_modified_compare_more_unrendered_values
totrue
within your dbt_project.yml. If the state directory was built with an older dbt version or if thestate_modified_compare_more_unrendered_values
behavior change flag was either not set or set tofalse
, you need to rebuild the state directory to avoid false positives during state comparison withstate:modified
.
I'm still looking into this, but wanted to document these updates for now.
Ooh, these new flags seem like they might be helpful! I think we are feeling enough pain in CI that an upgrade to 1.9 would be worth it.
Ian and I talked about this today and agreed on the following:
Once we're ready to upgrade our dbt version, the general steps to do so is:
Once in Versionless, we can then start working on setting the state_modified_compare_more_unrendered_values
flag to true (will need to rebuild the state first, as mentioned in the documentation). More reading here: https://docs.getdbt.com/reference/global-configs/behavior-changes#source-definitions-for-state
I created a new issue in the DSE repo for upgrading/testing dbt Versionless https://github.com/cagov/data-infrastructure/issues/423. Will do the work there before coming back to this issue to implement the new state modified flag to address slim CI false positives.
This is blocked until dbt versionless change is implemented.
We currently use "Slim CI" for this project, which (in theory), only runs models that are affected by a PR in CI. The intention here is that we can save time and money by just running the stuff that needs to be run. It works by comparing the code in the branch with that in the nightly production job, and identifying ones where there is a diff.
This is not working as intended because of the way we do some different logic depending upon whether we are "dev" or "prd". In particular, we select less data in dev for a lot of our larger models to improve the developer experience for those models. Unfortunately, this means that there is a real difference between dev and prod, even if the pre-rendered model has not changed, thus creating some false positives in our slim CI job.
I'm not sure of the best way to fix this right now.