dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.25k stars 1.53k forks source link

[Bug] Tests are incorrectly being marked `state:modified` if the comparison state is 1.8 vs 1.7 and there is a `tests` key in the `dbt_project.yml` file #10322

Open jeremyyeo opened 1 week ago

jeremyyeo commented 1 week ago

Is this a new bug in dbt-core?

Current Behavior

As per title, tests are being executed with state:modified in the CI run even if nothing has changed if:

  1. The CI run is on 1.8 (current state manifest).
  2. The Prod run is on 1.7 (previous state manifest).
  3. A tests key exist in the dbt_project.yml file for both (1) and (2).

Expected Behavior

Tests shouldn't be executed in the CI run.

Steps To Reproduce

  1. Project setup.
# dbt_project.yml
name: my_dbt_project
profile: pg-local
version: "1.0.0"

models:
   my_dbt_project:
    +materialized: table

tests:
   +store_failures: true

# models/sch.yml
version: 2
models:
  - name: foo
    columns:
      - name: id
        tests:
          - not_null
          - unique

The key here is to set a tests key in the dbt_project.yml file.

-- models/foo.sql
select 1 id
  1. Do a 1.7 build and store the manifest for deferring to:
$ dbt build
04:16:10  Running with dbt=1.7.16
04:16:10  Registered adapter: postgres=1.7.16
04:16:10  Unable to do partial parsing because of a version mismatch
04:16:10  Found 1 model, 2 tests, 0 sources, 0 exposures, 0 metrics, 401 macros, 0 groups, 0 semantic models
04:16:10  
04:16:11  Concurrency: 4 threads (target='dev')
04:16:11  
04:16:11  1 of 3 START sql table model public.foo ........................................ [RUN]
04:16:11  1 of 3 OK created sql table model public.foo ................................... [SELECT 1 in 0.07s]
04:16:11  2 of 3 START test not_null_foo_id .............................................. [RUN]
04:16:11  3 of 3 START test unique_foo_id ................................................ [RUN]
04:16:11  3 of 3 PASS unique_foo_id ...................................................... [PASS in 0.06s]
04:16:11  2 of 3 PASS not_null_foo_id .................................................... [PASS in 0.06s]
04:16:11  
04:16:11  Finished running 1 table model, 2 tests in 0 hours 0 minutes and 0.26 seconds (0.26s).
04:16:11  
04:16:11  Completed successfully
04:16:11  
04:16:11  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

$ mv target target_old
  1. Swap to 1.8 and do a state:modified build:
$ dbt build -s state:modified+ --defer --state target_old
04:17:32  Running with dbt=1.8.2
04:17:33  [WARNING]: Deprecated functionality
The `tests` config has been renamed to `data_tests`. Please see
https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more
information.
04:17:33  Registered adapter: postgres=1.8.1
04:17:33  Unable to do partial parsing because saved manifest not found. Starting full parse.
04:17:33  Found 1 model, 2 data tests, 413 macros
04:17:33  
04:17:33  Concurrency: 4 threads (target='dev')
04:17:33  
04:17:33  1 of 2 START test not_null_foo_id .............................................. [RUN]
04:17:33  2 of 2 START test unique_foo_id ................................................ [RUN]
04:17:33  2 of 2 PASS unique_foo_id ...................................................... [PASS in 0.06s]
04:17:33  1 of 2 PASS not_null_foo_id .................................................... [PASS in 0.06s]
04:17:33  
04:17:33  Finished running 2 data tests in 0 hours 0 minutes and 0.18 seconds (0.18s).
04:17:33  
04:17:33  Completed successfully
04:17:33  
04:17:33  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
  1. Now try modifying the dbt_project.yml file changing tests to data_tests:
# dbt_project.yml
...
data_tests:
   +store_failures: true
  1. Rerun as per (3):
$ rm -rf target
$ dbt build -s state:modified+ --defer --state target_old

04:20:49  Running with dbt=1.8.2
04:20:49  Registered adapter: postgres=1.8.1
04:20:49  Unable to do partial parsing because saved manifest not found. Starting full parse.
04:20:49  [WARNING]: Deprecated functionality
The `tests` config has been renamed to `data_tests`. Please see
https://docs.getdbt.com/docs/build/data-tests#new-data_tests-syntax for more
information.
04:20:50  Found 1 model, 2 data tests, 413 macros
04:20:50  The selection criterion 'state:modified+' does not match any enabled nodes
04:20:50  The selection criterion 'state:modified+' does not match any enabled nodes
04:20:50  The selection criterion 'state:modified+' does not match any enabled nodes
04:20:50  
04:20:50  Nothing to do. Try checking your model configs and model specification args

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.11.9
- dbt: dbt-core==1.7.16 + dbt-postgres==1.7.16 / dbt-core==1.8.2 + dbt-postgres==1.8.1

Which database adapter are you using with dbt?

postgres

Additional Context

Test matrix:

+-----------------------+-----------------------+---------------------+
| current_state         | previous_state        | test_state_modified |
+-----------------------+-----------------------+---------------------+
| 1.7, key = tests      | 1.7, key = tests      | no                  |
+-----------------------+-----------------------+---------------------+
| 1.8, key = tests      | 1.8, key = tests      | no                  |
+-----------------------+-----------------------+---------------------+
| 1.8, key = data_tests | 1.8, key = data_tests | no                  |
+-----------------------+-----------------------+---------------------+
| 1.8, key = data_tests | 1.8, key = tests      | yes                 |
+-----------------------+-----------------------+---------------------+
| 1.8, key = tests      | 1.7, key = tests      | yes                 |
+-----------------------+-----------------------+---------------------+

Because the tests > data_tests change is merely a warning, we can't expect that users will, in their 1.8 CI runs actually make that change. This means their CI runs may be running all tests and they will be wondering why this is happening. (It's also quite normal to have a prod run be on 1.7, while the CI run is on 1.8 / keep on latest, before eventually moving all your prod jobs to 1.8 / keep on latest).