dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.92k stars 1.63k forks source link

[Bug] Config in root dbt_project.yml takes precedence over config block of a model in a package #10707

Open jeremyyeo opened 1 month ago

jeremyyeo commented 1 month ago

Is this a new bug in dbt-core?

Current Behavior

Configs set in the root dbt_project.yml that aren't scoped to the root dbt project are taking precedent over config blocks set in model.sql files that are within packages.

Expected Behavior

Configs set in the root dbt_project.yml that aren't scoped to the root dbt project should not take precedent over config blocks set in model.sql files that are within packages.

Steps To Reproduce

  1. Create a local dbt package (public-package) like so:
#  /Users/jeremy/git/public-package/dbt_project.yml
name: "public_package"
version: "1.0.0"
config-version: 2
-- /Users/jeremy/git/public-package/models/tbl_config_block.sql
{{ config(materialized='table') }}
select 1 id
  1. Use the package in our actual dbt project:
# packages.yml
packages:
  - local: /Users/jeremy/git/public-package

# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: "1.0.0"

models:
  +materialized: view # oops
  my_dbt_project:
    +materialized: table
-- models/view.sql
{{ config(materialized='view') }}
select 1 id

-- models/table.sql
select 1 id
  1. Install deps and build
$ dbt deps
01:04:55  Running with dbt=1.8.5
01:04:55  Updating lock file in file path: /Users/jeremy/git/dbt-basic/package-lock.yml
01:04:55  Installing /Users/jeremy/git/public-package
01:04:55  Installed from <local @ /Users/jeremy/git/public-package>

$ dbt build
01:04:59  Running with dbt=1.8.5
01:05:00  Registered adapter: postgres=1.8.2
01:05:00  Found 3 models, 417 macros
01:05:00  
01:05:00  Concurrency: 4 threads (target='pg')
01:05:00  
01:05:00  1 of 3 START sql table model public.tbl ........................................ [RUN]
01:05:00  2 of 3 START sql view model public.view ........................................ [RUN]
01:05:00  3 of 3 START sql view model public.tbl_config_block ............................ [RUN]
01:05:00  3 of 3 OK created sql view model public.tbl_config_block ....................... [CREATE VIEW in 0.09s]
01:05:00  2 of 3 OK created sql view model public.view ................................... [CREATE VIEW in 0.09s]
01:05:00  1 of 3 OK created sql table model public.tbl ................................... [SELECT 1 in 0.09s]
01:05:00  
01:05:00  Finished running 2 view models, 1 table model in 0 hours 0 minutes and 0.32 seconds (0.32s).
01:05:00  
01:05:00  Completed successfully
01:05:00  
01:05:00  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

^ Here the model tbl_config_block that is in the package has ignored it's own {{ config(materialized='table') }}.

As we know about config inheritance (https://docs.getdbt.com/reference/configs-and-properties#config-inheritance):

The most specific config always takes precedence. This generally follows the order above: an in-file config() block --> properties defined in a .yml file --> config defined in the project file.

However we also say:

Configurations in your root dbt project have higher precedence than configurations in installed packages.

So I'm not sure how to reconcile those facts and make tbl_config_block ignore the +materialized: view that is set unscoped in the dbt_project.yml file.

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.11
- dbt: 1.8

Which database adapter are you using with dbt?

postgres

Additional Context

This arose cause a customer was doing exactly (unscoped +materialized in the dbt_project.yml) that but also using the dbt-project-evaluator package which raises an exception when the materialization of its models are not tables (https://github.com/dbt-labs/dbt-project-evaluator/blob/main/models/staging/graph/stg_nodes.sql).

dbeatty10 commented 1 month ago

Thanks for such a complete view of things @jeremyyeo 🤩

Configurations in your root dbt project have higher precedence than configurations in installed packages.

👆 I think what is happening is that this statement above is trumping everything else. i.e., Configurations in your root dbt project have higher precedence than everything else related to config precedence.

Since we have this behavior documented, I don't think its a bug, and we probably won't make a change in dbt-core. However, this shows us that this can definitely be surprising / "gotcha" behavior.

Here's some ideas of next steps:

  1. For the customer: Remove the unscoped +materialized in dbt_project.yml. Add it scoped to each package as desired
  2. For dbt Labs: Remove any code examples of unscoped +materialized in dbt_project.yml
  3. For dbt Labs: Add a warning note in the product documentation about unscoped configuration (like +materialized) in dbt_project.yml
  4. For dbt Labs: Add a warning within dbt_project_evaluator whenever there is an unscoped configuration (like +materialized) in dbt_project.yml

I'm curious if you have any thoughts about these or other ideas @jeremyyeo ?