dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.99k stars 1.63k forks source link

[Bug] Model properties.yml file appears to support `post-hook` and `post_hook` but not really #10965

Open jeremyyeo opened 2 weeks ago

jeremyyeo commented 2 weeks ago

Is this a new bug in dbt-core?

Current Behavior

When setting hooks in properties.yml files, our docs (https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook) point out that we should do post_hook instead of post-hook. However, post-hook still appear to work - but not fully ({{ ref() }} and {{ this }}) don't get resolved like if we used post_hook.

We should make them both work the same way or just stop post-hook from being a thing evaluated at all.

Expected Behavior

Either:

  1. post-hook is ignored in properties.yml.
  2. post-hook and post_hook work the same way when resolving {{ ref() }} / {{ this }} (and maybe other conditions? not sure).

Steps To Reproduce

-- models/foo.sql
select 1 id
# models/schema.yml
models:
  - name: foo
    config:
      post-hook: "update {{ this }} set id = 2"
$ dbt --debug run -s foo
...
18:27:18  1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
18:27:18  Re-using an available connection from the pool (formerly list_db_dbt_jyeo, now model.my_dbt_project.foo)
18:27:18  Began compiling node model.my_dbt_project.foo
18:27:18  Writing injected SQL for node "model.my_dbt_project.foo"
18:27:18  Began executing node model.my_dbt_project.foo
18:27:18  Writing runtime sql for node "model.my_dbt_project.foo"
18:27:18  Using snowflake connection "model.my_dbt_project.foo"
18:27:18  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.9.0b2", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table db.dbt_jyeo.foo
         as
        (select 1 id
        );
18:27:19  SQL status: SUCCESS 1 in 1.173 seconds
18:27:19  Using snowflake connection "model.my_dbt_project.foo"
18:27:19  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.9.0b2", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
update  set id = 2
18:27:20  Snowflake adapter: Snowflake query id: 01b815d3-0606-acdb-000d-378340fdb7b6
18:27:20  Snowflake adapter: Snowflake error: 001003 (42000): SQL compilation error:
syntax error line 1 at position 8 unexpected 'set'.
18:27:20  Database Error in model foo (models/foo.sql)
  001003 (42000): SQL compilation error:
  syntax error line 1 at position 8 unexpected 'set'.
  compiled code at target/run/my_dbt_project/models/foo.sql
18:27:20  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'c71b8cad-4eed-4c86-a3ce-3621198e3a62', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1597d9a50>]}
18:27:20  1 of 1 ERROR creating sql table model dbt_jyeo.foo ............................. [ERROR in 1.54s]

^ We can see did try to run the "hook" but {{ this }} didn't get resolved:

update  set id = 2

Let's modify our yaml:

# models/schema.yml
models:
  - name: foo
    config:
      post_hook: "update {{ this }} set id = 2"  # underscore key
$ dbt --debug run -s foo
...
18:28:49  1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
18:28:49  Re-using an available connection from the pool (formerly list_db_dbt_jyeo, now model.my_dbt_project.foo)
18:28:49  Began compiling node model.my_dbt_project.foo
18:28:49  Writing injected SQL for node "model.my_dbt_project.foo"
18:28:49  Began executing node model.my_dbt_project.foo
18:28:49  Writing runtime sql for node "model.my_dbt_project.foo"
18:28:49  Using snowflake connection "model.my_dbt_project.foo"
18:28:49  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.9.0b2", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table db.dbt_jyeo.foo
         as
        (select 1 id
        );
18:28:50  SQL status: SUCCESS 1 in 1.164 seconds
18:28:50  Using snowflake connection "model.my_dbt_project.foo"
18:28:50  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.9.0b2", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
update db.dbt_jyeo.foo set id = 2
18:28:51  SQL status: SUCCESS 1 in 0.654 seconds
18:28:51  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'fe2baad2-5e85-444a-95b0-12ca8317333a', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x12a019e10>]}
18:28:51  1 of 1 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 1.89s]

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.11.9
- dbt: dbt-core==1.9.0-b2 / dbt-snowflake==1.9.0b1

Which database adapter are you using with dbt?

snowflake

Additional Context

If you tried to use ref() in post-hook - you'll get a ref() is undefined error. But if using ref() in post_hook - it's all good.

Also tried on older dbt versions (1.7) - same inconsistency there too.

dbeatty10 commented 2 weeks ago

Thanks for reaching out about this @jeremyyeo !

In general, hyphens have been for dbt_project.yml and underscores for other YAML files. But this delineation isn't easy for us humans to follow, and it's easy to flip-flop them (but hard to debug when it doesn't work as expected). This was surely the inspiration for https://github.com/dbt-labs/dbt-core/issues/9235 to broadly solve this.

The particular issue you raised looks similar to https://github.com/dbt-labs/dbt-core/pull/6435 -- but precisely the other way around. Maybe the solution could be as simple as extending the code added in https://github.com/dbt-labs/dbt-core/pull/10603 to include pre-hook and post-hook?

aranke commented 1 week ago

A/C is tests for this 2x3 matrix from technical refinement:

Also separately check that hooks serializes/deserialize properly in manifest.json node objects