dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.92k stars 1.63k forks source link

[Bug] Tests for models in installed packages are triggered for models with the same name in the current package #10652

Open jaklan opened 2 months ago

jaklan commented 2 months ago

Is this a new bug in dbt-core?

Current Behavior

When you install a package containing models with specified tests as your project dependency, and create a model in the current package with the same name as one of those models - tests for the upstream model will be triggered also for the new one (even if package:this selector is used) and can cause dbt test / dbt build failures.

Expected Behavior

Tests are not propagated to downstream packages, especially when using package:this selector.

Steps To Reproduce

models:


I have also run {{ print(model) }} inside the test - and can confirm it's triggered for the model from the current package, not the upstream one.

Environment

- OS: macOS
- Python: 3.10
- dbt: 1.8.5

Which database adapter are you using with dbt?

redshift, duckdb (doesn't matter)

dbeatty10 commented 1 month ago

Thanks for reporting this @jaklan !

I was able to reproduce what you reported. But I didn't need to use a custom generic test. Using a standard out-of-the-box test like not_null was sufficient. Details in the "reprex" section below.

More details

I was also able to get this to behave correctly by:

  1. temporarily deleting project_b.model_a1,
  2. using --no-partial-parse when doing a dbt list (or dbt parse, etc) to recreate the partial parsing file
  3. reinstating project_b.model_a1,
  4. proceeding from there

That's not a sustainable workaround! But it did give me a good manifest.json that I could compare with a bad one:

The key differences were in the parent_map and child_map sections.

Good

    "parent_map": {
        "model.package_a.model_a1": [],
        "test.package_a.sample_test_model_a1_id.ff32e12b9f": [
            "model.package_a.model_a1"
        ],
        "model.package_b.model_a1": []
    },
    "child_map": {
        "model.package_a.model_a1": [
            "test.package_a.sample_test_model_a1_id.ff32e12b9f"
        ],
        "test.package_a.sample_test_model_a1_id.ff32e12b9f": [],
        "model.package_b.model_a1": []
    },

Bad

    "parent_map": {
        "model.package_b.model_a1": [],
        "model.package_a.model_a1": [],
        "test.package_a.sample_test_model_a1_id.ff32e12b9f": [
            "model.package_b.model_a1"
        ]
    },
    "child_map": {
        "model.package_b.model_a1": [
            "test.package_a.sample_test_model_a1_id.ff32e12b9f"
        ],
        "model.package_a.model_a1": [],
        "test.package_a.sample_test_model_a1_id.ff32e12b9f": []
    },

Basically, when parsing from scratch, it is favoring the model name from the root package (package_b) and incorrectly linking it to the upstream test definition (from package_a). šŸ’„

### Reprex `package_a/models/model_a1.sql` ```sql select 1 as id ``` `models/schema.yml` ```yaml models: - name: model_a1 columns: - name: id tests: - not_null ``` `package_b/models/model_a1.sql` ```sql {{ config(alias="package_b__model_a1") }} select 1 as id ``` `packages.yml` ```yaml packages: - local: ../package_a ``` Run these commands: ```shell cd package_b dbt deps dbt list --no-partial-parse dbt list dbt list --select "package:this,model_a1" dbt test -s "package:this,model_a1" ``` šŸ’„ Here's a workaround to get it to behave as desired (at least for a little while): ```shell mv models/model_a1.sql models/model_a1.sql.x dbt list --no-partial-parse mv models/model_a1.sql.x models/model_a1.sql dbt list dbt list --select "package:this,model_a1" dbt test -s "package:this,model_a1" ``` āœ… But this is not a sustainable workaround!
jaklan commented 1 month ago

@dbeatty10 thanks for verification!