dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.93k stars 1.63k forks source link

Should column level `meta` modifications be considered `state:modified`? #10189

Open jeremyyeo opened 5 months ago

jeremyyeo commented 5 months ago

Is this a new bug in dbt-core?

Current Behavior

As title, changes to column level meta configs aren't considered state:modified.

Expected Behavior

Changes to column level meta configs should be considered state:modified.

Steps To Reproduce

  1. Setup dbt project.
# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: "1.0.0"

models:
 my_dbt_project:
   +materialized: table
# models/schema.yml
version: 2
models:
  - name: foo
    config:
      meta:
        k: v
    columns:
      - name: id
        meta:
         k: v
  - name: bar
    config:
      meta:
        k: v 
    columns:
      - name: id
        meta:
         k: v
-- models/foo.sql
select 1 id

-- models/bar.sql
select 1 id
  1. Initial build and state storing.
$ dbt build && mv target target_old
00:00:49  Running with dbt=1.7.14
00:00:49  Registered adapter: postgres=1.7.14
00:00:49  Found 2 models, 0 sources, 0 exposures, 0 metrics, 402 macros, 0 groups, 0 semantic models
00:00:49  
00:00:49  Concurrency: 4 threads (target='dev')
00:00:49  
00:00:49  1 of 2 START sql table model public.bar ........................................ [RUN]
00:00:49  2 of 2 START sql table model public.foo ........................................ [RUN]
00:00:49  1 of 2 OK created sql table model public.bar ................................... [SELECT 1 in 0.08s]
00:00:49  2 of 2 OK created sql table model public.foo ................................... [SELECT 1 in 0.08s]
00:00:49  
00:00:49  Finished running 2 table models in 0 hours 0 minutes and 0.17 seconds (0.17s).
00:00:49  
00:00:49  Completed successfully
00:00:49  
00:00:49  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
  1. Modify schema.yml file (a model level change on foo, and a column level change on bar):
# models/schema.yml
version: 2
models:
  - name: foo
    config:
      meta:
        k: model_level_change
    columns:
      - name: id
        meta:
         k: v
  - name: bar
    config:
      meta:
        k: v 
    columns:
      - name: id
        meta:
         k: column_level_change
  1. Deferred build:
$ dbt build -s state:modified --defer --state target_old
00:02:48  Running with dbt=1.7.14
00:02:48  Registered adapter: postgres=1.7.14
00:02:48  Unable to do partial parsing because saved manifest not found. Starting full parse.
00:02:48  Found 2 models, 0 sources, 0 exposures, 0 metrics, 402 macros, 0 groups, 0 semantic models
00:02:48  
00:02:48  Concurrency: 4 threads (target='dev')
00:02:48  
00:02:48  1 of 1 START sql table model public.foo ........................................ [RUN]
00:02:48  1 of 1 OK created sql table model public.foo ................................... [SELECT 1 in 0.06s]
00:02:48  
00:02:48  Finished running 1 table model in 0 hours 0 minutes and 0.15 seconds (0.15s).
00:02:48  
00:02:48  Completed successfully
00:02:48  
00:02:48  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

foo is built but not bar.

Relevant log output

Shared above.

Environment

- OS: macOS
- Python: 3.11.9
- dbt:

Core:
  - installed: 1.7.14
  - latest:    1.8.0  - Update available!

Plugins:
  - postgres:   1.7.14 - Update available!

Which database adapter are you using with dbt?

postgres

Additional Context

Not 100% sure if this has ever worked - could be FR like https://github.com/dbt-labs/dbt-core/issues/10020

karenderer commented 5 months ago

Interesting that this is considered a bug because I wouldn't expect bar to be modified (and wouldn't want it to run) if only the meta config is changed. If this becomes the default behavior of state:modified I'd like a way of opting out.

jeremyyeo commented 5 months ago

Interesting that this is considered a bug because I wouldn't expect bar to be modified (and wouldn't want it to run) if only the meta config is changed. If this becomes the default behavior of state:modified I'd like a way of opting out.

Yeah I don't really know for sure if it's a bug or a FR.

jtcohen6 commented 5 months ago

Confirming - this isn't a bug, it's an opinionated choice we made long long ago that:

In retrospect, I think that can lead to situations like this one, where the outcome is technically correct but less unintuitive. Still, this is how it's been for many years.

jmkacz commented 5 months ago

This is a bug from our perspective because of the added interaction of the dbt_snow_mask package. If we add masking via models[].columns[].meta.masking_policy, right now, it will not get applied unless we perform a full build.

I understand your perspective that meta is just documentation. What we have here though are processes that key off of meta data.

From their "How to apply masking policy?" documentation:

models:
  - name: stg_customer
    columns:
      - name: email
        meta:
          masking_policy: mp_encrypt_pii

References:

AmplifyPegPeterson commented 5 months ago
  • tags + meta are not considered "modifications" per se, they are metadata updates only
  • description is a "modification" if and only if persist_docs is enabled, otherwise it's only a metadata update

In the example, the only change is to update a meta config to both models just at different levels, not description change. So the way dbt is currently functioning is model level meta tags ARE seen as modifications however column level meta tags are not. It is still unclear why a metadata update at the model level would be considered different than a column level meta update.