dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.9k stars 1.63k forks source link

[Bug] If a test is added before a model is written, partial parsing will not find the test until a full reparse #10461

Open jeremyyeo opened 3 months ago

jeremyyeo commented 3 months ago

Is this a new bug in dbt-core?

Current Behavior

If you wrote a test first, parse the project, then wrote the model, dbt will not actually run the test due to partial parsing.

Expected Behavior

Even with partial parsing, dbt should know to run the test even if written prior to the model file.

Steps To Reproduce

  1. Add a schema yml file first but importantly don't create any model file yet.
# models/schema.yml
models:
  - name: foo
    columns:
      - name: id
        data_tests:
          - not_null
  1. Do initial parse with clean target/:
$ ls target
ls: target: No such file or directory

$ dbt parse
20:30:13  Running with dbt=1.8.3
20:30:13  Registered adapter: postgres=1.8.2
20:30:13  Unable to do partial parsing because saved manifest not found. Starting full parse.
20:30:14  [WARNING]: Did not find matching node for patch with name 'foo' in the 'models' section of file 'models/schema.yml'
20:30:14  [WARNING]: Test 'test.my_dbt_project.not_null_foo_id.f099b1e59c' (models/schema.yml) depends on a node named 'foo' in package '' which was not found
20:30:14  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.my_dbt_project
20:30:14  Performance info: /Users/jeremy/git/dbt-basic/target/perf_info.json
  1. Add model file:
-- models/foo.sql
select 1 id
  1. Build:
$ dbt build
20:31:35  Running with dbt=1.8.3
20:31:35  Registered adapter: postgres=1.8.2
20:31:35  Found 1 model, 413 macros
20:31:35  
20:31:35  Concurrency: 4 threads (target='pg')
20:31:35  
20:31:35  1 of 1 START sql table model public.foo ........................................ [RUN]
20:31:35  1 of 1 OK created sql table model public.foo ................................... [SELECT 1 in 0.07s]
20:31:35  
20:31:35  Finished running 1 table model in 0 hours 0 minutes and 0.19 seconds (0.19s).
20:31:35  
20:31:35  Completed successfully
20:31:35  
20:31:35  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

^ Notice no "test" was identified.

  1. Build but do a full parse:
$ dbt build --no-partial-parse
20:32:22  Running with dbt=1.8.3
20:32:22  Registered adapter: postgres=1.8.2
20:32:23  Found 1 model, 1 test, 413 macros
20:32:23  
20:32:23  Concurrency: 4 threads (target='pg')
20:32:23  
20:32:23  1 of 2 START sql table model public.foo ........................................ [RUN]
20:32:23  1 of 2 OK created sql table model public.foo ................................... [SELECT 1 in 0.06s]
20:32:23  2 of 2 START test not_null_foo_id .............................................. [RUN]
20:32:23  2 of 2 PASS not_null_foo_id .................................................... [PASS in 0.04s]
20:32:23  
20:32:23  Finished running 1 table model, 1 test in 0 hours 0 minutes and 0.24 seconds (0.24s).
20:32:23  
20:32:23  Completed successfully
20:32:23  
20:32:23  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.11.9
- dbt: 1.8.3

Which database adapter are you using with dbt?

postgres

Additional Context

In dbt Cloud, every save action does a parse. So if a user was practicing TDD - I guess write the test first :P then they could run into this scenario.

https://dbt-labs.slack.com/archives/C02SRNY2EQ4/p1721245567642059

dbeatty10 commented 3 months ago

Thanks for finding this and writing it up @swhite-dbt and @jeremyyeo

gshank commented 3 months ago

I think this is a special case of #10323. For that ticket we'd probably have to re-parse every node with a warning since it would be difficult to handle different cases for different warnings, so if the schema entry with the warning was re-parsed, it would connect to the just-added node.