dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.61k stars 1.59k forks source link

[Bug] `TypeError` in `TagSelectorMethod` when node is `unit_test`. `unit_test` does not have `tags` attr on node, only as property of `config` #10519

Open lamalex opened 1 month ago

lamalex commented 1 month ago

Is this a new bug in dbt-core?

Current Behavior

I'm only able to reproduce this with Dagster/Dagster dbt - i haven't been able to work the cli to reproduce, here's the relevant section of code from the dagster dbt library (https://github.com/dagster-io/dagster/blob/278e49048ba961070a7caaf161ee30a01a31ee84/python_modules/libraries/dagster-dbt/dagster_dbt/utils.py#L19):


    graph = graph_selector.Graph(DiGraph(incoming_graph_data=child_map))

    # create a parsed selection from the select string
    _set_flag_attrs(
        {
            "INDIRECT_SELECTION": IndirectSelection.Eager,
            "WARN_ERROR": True,
        }
    )
    parsed_spec: SelectionSpec = graph_cli.parse_union([select], True)

    if exclude:
        parsed_exclude_spec = graph_cli.parse_union([exclude], False)
        parsed_spec = graph_cli.SelectionDifference(components=[parsed_spec, parsed_exclude_spec])

    # execute this selection against the graph
    selector = graph_selector.NodeSelector(graph, manifest)
    selected, _ = selector.select_nodes(parsed_spec)

I've stepped through the code in the debugger and am fairly sure this is a DBT bug and not a Dagster bug.

in

class TagSelectorMethod(SelectorMethod):
    def search(self, included_nodes: Set[UniqueId], selector: str) -> Iterator[UniqueId]:
        """yields nodes from included that have the specified tag"""
        for unique_id, node in self.all_nodes(included_nodes):
            if hasattr(node, "tags") and any(fnmatch(tag, selector) for tag in node.tags):
                yield unique_id

if the node is a unit_test node, it's tags attr is None, which causes the iteration to explode. In my manifest there is no tags attribute on the top level of the unit_test node (like in other node types), tags only exists on config.

Expected Behavior

dbt can successfully iterate over node types without a type error

Steps To Reproduce

Set up a dagster dbt project with unit tests and use a tag:... selector

for instance

@dbt_assets(
    manifest=dbt_manifest_paths["redshift"],
    select="config.materialized:incremental",
    exclude="dbx_only tag:elementary",
    backfill_policy=BackfillPolicy.single_run(),
    dagster_dbt_translator=BaseDagsterDbtTranslator(),
    partitions_def=hourly_since_big3,
)

Relevant log output

File "/Users/alauni/Code/launi/launi-dbt/orchestration_dagster/usercode/__init__.py", line 1, in <module>
    from .definitions import defs as defs
  File "/Users/alauni/Code/launi/launi-dbt/orchestration_dagster/usercode/definitions.py", line 3, in <module>
    from .assets import assets
  File "/Users/alauni/Code/launi/launi-dbt/orchestration_dagster/usercode/assets/__init__.py", line 1, in <module>
    from .dbt import assets as dbt_assets
  File "/Users/alauni/Code/launi/launi-dbt/orchestration_dagster/usercode/assets/dbt.py", line 84, in <module>
    @dbt_assets(
     ^^^^^^^^^^^
  File "/Users/alauni/Code/launi/launi-dbt/.venv/lib/python3.12/site-packages/dagster/_core/decorator_utils.py", line 223, in wrapped_with_context_manager_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/alauni/Code/launi/launi-dbt/.venv/lib/python3.12/site-packages/dagster_dbt/asset_decorator.py", line 311, in dbt_assets
    ) = build_dbt_multi_asset_args(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alauni/Code/launi/launi-dbt/.venv/lib/python3.12/site-packages/dagster_dbt/asset_utils.py", line 806, in build_dbt_multi_asset_args
    unique_ids = select_unique_ids_from_manifest(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alauni/Code/launi/launi-dbt/.venv/lib/python3.12/site-packages/dagster_dbt/utils.py", line 309, in select_unique_ids_from_manifest
    selected, _ = selector.select_nodes(parsed_spec)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alauni/Code/launi/launi-dbt/.venv/lib/python3.12/site-packages/dbt/graph/selector.py", line 160, in select_nodes
    direct_nodes, indirect_nodes = self.select_nodes_recursively(spec)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alauni/Code/launi/launi-dbt/.venv/lib/python3.12/site-packages/dbt/graph/selector.py", line 132, in select_nodes_recursively
    bundles = [self.select_nodes_recursively(component) for component in spec]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alauni/Code/launi/launi-dbt/.venv/lib/python3.12/site-packages/dbt/graph/selector.py", line 132, in select_nodes_recursively
    bundles = [self.select_nodes_recursively(component) for component in spec]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alauni/Code/launi/launi-dbt/.venv/lib/python3.12/site-packages/dbt/graph/selector.py", line 132, in select_nodes_recursively
    bundles = [self.select_nodes_recursively(component) for component in spec]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alauni/Code/launi/launi-dbt/.venv/lib/python3.12/site-packages/dbt/graph/selector.py", line 130, in select_nodes_recursively
    direct_nodes, indirect_nodes = self.get_nodes_from_criteria(spec)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alauni/Code/launi/launi-dbt/.venv/lib/python3.12/site-packages/dbt/graph/selector.py", line 84, in get_nodes_from_criteria
    collected = self.select_included(nodes, spec)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alauni/Code/launi/launi-dbt/.venv/lib/python3.12/site-packages/dbt/graph/selector.py", line 70, in select_included
    return set(method.search(included_nodes, spec.value))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alauni/Code/launi/launi-dbt/.venv/lib/python3.12/site-packages/dbt/graph/selector_methods.py", line 270, in search
    if hasattr(node, "tags") and any(fnmatch(tag, selector) for tag in node.tags):

Environment

- OS: MacOS 14.6
- Python: 3.12.3
- dbt:
Core:
  - installed: 1.8.4
  - latest:    1.8.4 - Up to date!

Plugins:
  - databricks: 1.8.4 - Up to date!
  - redshift:   1.8.1 - Up to date!
  - postgres:   1.8.2 - Up to date!
  - spark:      1.8.0 - Up to date!

Which database adapter are you using with dbt?

redshift, other (mention it in "Additional Context")

Additional Context

databricks

dbeatty10 commented 1 month ago

Thanks for reaching out @lamalex !

I wasn't able to trigger any error after some quick attempts with dbt commands plus a tag: selector with the dbt CLI.

My guess is that somehow this scenario is prevented from happening when using the dbt CLI, but it is somehow possible when using Dagster.

Regardless, it looks like it could be prevented here by doing something similar to here.

i.e., replace if hasattr(node, "tags") with if hasattr(node, "tags") and node.tags.

lamalex commented 1 month ago

This is how I fixed it locally, the only reason I didn’t open a PR was that I wasn’t sure if this was the right place for the fix, or if it was a bug in manifest generation. This works!

lamalex commented 1 month ago

Hi @dbeatty10 is there anything more needed to merge this for the next release? I’m running a fork in prod rn 😬

dbeatty10 commented 1 month ago

is there anything more needed to merge this for the next release? I’m running a fork in prod rn 😬

https://github.com/dbt-labs/dbt-core/pull/10520 is in code review. If it gets approved, then it will go into dbt-core v1.9 this fall. It hasn't been decided yet if it would be back-ported into v.1.8.x.

lamalex commented 1 month ago

What would need to be true to backport?

dbeatty10 commented 1 month ago

What would need to be true to backport?

Fixes for regressions and net-new bugs that were present in the minor version's original release will be backported to releases with active support. Other bug fixes will be backported when we have high confidence that they're narrowly scoped and won't cause unintended side effects.

gshank commented 1 month ago

The "tags" field in UnitTestDefinition UnitTestConfig should never be None, and I don't think it is in dbt. It is not an Optional field and so should never be None. How does it get to be None in Dagster?

lamalex commented 1 month ago

@gshank it’s not on the config object- it’s on the root. The config object DOES have tags, but that’s not what’s being iterated here (i think!)

gshank commented 1 month ago

It's coming directly from the config object for the case of UnitTestDefinitions, via a "def tags" property on the node. So this would happen if the UnitTestConfig is constructed improperly with tags set to None. The definition of "tags" on UnitTestConfig does not include Optional and we don't set it to None in our code anywhere.