ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Add selector for `test-dbt-models` workflow and set its indirect selection to "cautious" #373

Closed jeancochrane closed 2 months ago

jeancochrane commented 2 months ago

This PR fixes a bug in the test-dbt-models workflow revealed by the latest workflow run. The indirect selection for our tests is set to "eager" by default, which means:

By default, runs tests if any of the parent nodes are selected, regardless of whether all dependencies are met. This includes ANY tests that reference the selected nodes.

As a result, tests that are defined on a non-selected model will be run if they reference a selected model in an argument. This causes test-dbt-models to run tax.pin tests that reference legdat and pardat in their arguments:

https://github.com/ccao-data/data-architecture/blob/9e821802f2e964e4e6fff6fd04685d9137f4abf3/dbt/models/tax/schema.yml#L30-L64

This is a problem since the failure workbook code can only parse tests defined on models in the iasworld schema, so it raises an error when it tries to parse tests defined on tax.pin.

In order to fix this bug, this PR sets the indirect selection to "cautious" for this workflow, which means:

Restricts tests to only those that exclusively reference selected nodes. Tests will only be executed if all the nodes they depend on are selected, which prevents tests from running if one or more of its parent nodes are unselected...

In the process, we define a YAML selector for this particular selection string, to make it easier to read and reuse.

Evidence that the workflow succeeds here: https://github.com/ccao-data/data-architecture/actions/runs/8634848786/job/23671719425