dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.03k stars 1.64k forks source link

[Bug] Unhandled exception when using --state and referring to a removed test #10630

Open jaklan opened 3 months ago

jaklan commented 3 months ago

Is this a new bug in dbt-core?

Description

Hi, recently in one of our projects a generic test, which was still referenced by some models, was by coincidence removed.

If we run dbt run, we would get pretty clear error:

15:18:37    Compilation Error in test sample_test_model_a1_ (models/schema.yml)
  'test_sample_test' is undefined. This can happen when calling a macro that does not exist. Check for typos and/or install package dependencies with "dbt deps".

But, if we run dbt run --select state:modified --state previous-artifacts (which we use by default) we get:

...
  File "/Users/jaklan/.local/pipx/venvs/dbt-core/lib/python3.10/site-packages/dbt/graph/selector_methods.py", line 782, in search
    if checker(previous_node, node, **keyword_args):  # type: ignore
  File "/Users/jaklan/.local/pipx/venvs/dbt-core/lib/python3.10/site-packages/dbt/graph/selector_methods.py", line 684, in check_modified_content
    upstream_macro_change = self.check_macros_modified(new)
  File "/Users/jaklan/.local/pipx/venvs/dbt-core/lib/python3.10/site-packages/dbt/graph/selector_methods.py", line 670, in check_macros_modified
    return self.recursively_check_macros_modified(node, visited_macros)
  File "/Users/jaklan/.local/pipx/venvs/dbt-core/lib/python3.10/site-packages/dbt/graph/selector_methods.py", line 644, in recursively_check_macros_modified
    macro_node = self.manifest.macros[macro_uid]
KeyError: None

Generally, It was not noticed during MR, because in our MR pipelines we run a mix of dbt parse + dbt-checkpoint to identify only changed models and avoid compiling the full project - and that obviously doesn't cover the issues like above. However, I wonder whether such issue shouldn't be identified already during parsing?

Anyway, back to the problem - when playing a bit with debugger, we can see such node:

(GenericTestNode(database='memory',
                 schema='_jaklan_dbt_test__audit',
                 name='sample_test_model_a1_',
                 resource_type=<NodeType.Test: 'test'>,
                 package_name='project_a',
                 path='sample_test_model_a1_.sql',
                 original_file_path='models/schema.yml',
                 unique_id='test.project_a.sample_test_model_a1_.77f8cfc6f6',
                 fqn=['project_a', 'sample_test_model_a1_'],
                 alias='sample_test_model_a1_',
                 ...
                 depends_on=DependsOn(macros=[None,
                                              'macro.dbt.get_where_subquery'],
                                      nodes=['model.project_a.model_a1']),

The issue is with the macros=[None, ...] part - that None is then retrieved as macro_uid and fails dictionary look-up: macro_node = self.manifest.macros[macro_uid].

Imho the easiest solution would be to add a try-except to raise the concrete error like:
{node.name} depends on macro or test, which doesn't exist anymore
unless you want to handle it in more sophisticated way.

Environment

- OS: macOS
- Python: 3.10
- dbt: 1.8.5

Which database adapter are you using with dbt?

redshift, duckdb (doesn't matter)

dbeatty10 commented 3 months ago

Thanks for reporting this @jaklan !

I was able to reproduce the error you got with the files and commands below.

### Reprex `models/my_model.sql` ```sql select 1 as id ``` `tests/generic/my_unique.sql` ```sql {% test my_unique(model, column_name) %} -- custom generic test with validation_errors as ( select {{ column_name }} from {{ model }} group by {{ column_name }} having count(*) > 1 ) select * from validation_errors {% endtest %} ``` `models/_models.yml` ```yaml models: - name: my_model columns: - name: id tests: # custom generic test - my_unique ``` Run these commands: ```shell dbt parse --target-path previous-artifacts mv tests/generic/my_unique.sql tests/generic/my_unique.sql.x dbt compile dbt run --select state:modified --state previous-artifacts ``` Get this output: ``` $ dbt parse --target-path previous-artifacts mv tests/generic/my_unique.sql tests/generic/my_unique.sql.x dbt compile dbt run --select state:modified --state previous-artifacts 19:56:23 Running with dbt=1.8.0 19:56:26 Registered adapter: duckdb=1.8.3 19:56:27 Performance info: /Users/dbeatty/projects/copier-templates/duckdb-core-10630/previous-artifacts/perf_info.json 19:56:28 Running with dbt=1.8.0 19:56:29 Registered adapter: duckdb=1.8.3 19:56:29 Found 1 model, 1 test, 411 macros 19:56:29 19:56:29 Concurrency: 1 threads (target='dev') 19:56:29 19:56:29 Encountered an error: Runtime Error Compilation Error in test my_unique_my_model_id (models/_models.yml) 'test_my_unique' is undefined. This can happen when calling a macro that does not exist. Check for typos and/or install package dependencies with "dbt deps". 19:56:30 Running with dbt=1.8.0 19:56:31 Registered adapter: duckdb=1.8.3 19:56:31 Found 1 model, 1 test, 411 macros 19:56:31 Encountered an error: None 19:56:31 Traceback (most recent call last): File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/cli/requires.py", line 138, in wrapper result, success = func(*args, **kwargs) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/cli/requires.py", line 101, in wrapper return func(*args, **kwargs) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/cli/requires.py", line 218, in wrapper return func(*args, **kwargs) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/cli/requires.py", line 247, in wrapper return func(*args, **kwargs) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/cli/requires.py", line 294, in wrapper return func(*args, **kwargs) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/cli/requires.py", line 332, in wrapper return func(*args, **kwargs) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/cli/main.py", line 568, in run results = task.run() File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/task/runnable.py", line 506, in run self._runtime_initialize() File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/task/compile.py", line 125, in _runtime_initialize super()._runtime_initialize() File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/task/runnable.py", line 151, in _runtime_initialize self.job_queue = self.get_graph_queue() File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/task/runnable.py", line 144, in get_graph_queue return selector.get_graph_queue(spec) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector.py", line 331, in get_graph_queue selected_nodes = self.get_selected(spec) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector.py", line 321, in get_selected selected_nodes, indirect_only = self.select_nodes(spec) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector.py", line 160, in select_nodes direct_nodes, indirect_nodes = self.select_nodes_recursively(spec) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector.py", line 132, in select_nodes_recursively bundles = [self.select_nodes_recursively(component) for component in spec] File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector.py", line 132, in bundles = [self.select_nodes_recursively(component) for component in spec] File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector.py", line 132, in select_nodes_recursively bundles = [self.select_nodes_recursively(component) for component in spec] File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector.py", line 132, in bundles = [self.select_nodes_recursively(component) for component in spec] File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector.py", line 132, in select_nodes_recursively bundles = [self.select_nodes_recursively(component) for component in spec] File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector.py", line 132, in bundles = [self.select_nodes_recursively(component) for component in spec] File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector.py", line 130, in select_nodes_recursively direct_nodes, indirect_nodes = self.get_nodes_from_criteria(spec) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector.py", line 84, in get_nodes_from_criteria collected = self.select_included(nodes, spec) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector.py", line 70, in select_included return set(method.search(included_nodes, spec.value)) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector_methods.py", line 782, in search if checker(previous_node, node, **keyword_args): # type: ignore File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector_methods.py", line 684, in check_modified_content upstream_macro_change = self.check_macros_modified(new) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector_methods.py", line 670, in check_macros_modified return self.recursively_check_macros_modified(node, visited_macros) File "/Users/dbeatty/projects/environments/dbt_1.8/lib/python3.10/site-packages/dbt/graph/selector_methods.py", line 644, in recursively_check_macros_modified macro_node = self.manifest.macros[macro_uid] KeyError: None ```
jaklan commented 2 months ago

I have tried one more thing today - to run:

dbt run --select state:modified,package:this --state previous-artifacts

instead of:

dbt run --select state:modified --state previous-artifacts

when the removed test was only affecting an upstream package, but not the current one. Unfortunately, the result is the same.

ChenyuLInx commented 1 month ago

Looks like this happens semi-frequently, we should provide a better error message in this case

jaklan commented 1 month ago

@ChenyuLInx that's a "quick fix", but the real one would be to discover already during parsing that model refers to non-existing test