dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.61k stars 1.59k forks source link

[CT-3063] [Feature] state:modified doesn't detect changes to macros passed as variables #8526

Open juls858 opened 1 year ago

juls858 commented 1 year ago

Is this a new bug in dbt-core?

Current Behavior

# this is a model
{%- set configs = [
    config_macro,
    ] -%}
{{- caller_macro(configs) -}}

_configmacro is passed as variable to be called by downstream macro

if _configmacro is changed, the state change is not detected by state:modified

if _callermacro is changed, the state IS detected by state:modified

Expected Behavior

state:modified should detect changes to all upstream macros.

Steps To Reproduce

dbt ls --select state:modified

Relevant log output

No response

Environment

- OS: 13.5 (22G74)
- Python: 3.10.12
- dbt: 1.6.1

Which database adapter are you using with dbt?

snowflake

Additional Context

https://github.com/FlipsideCrypto/livequery-models/blob/f15f5fc17535f295c772ff7b4af5725da23a659a/models/deploy/marketplace/quicknode/quicknode_utils__quicknode_utils.sql

dbeatty10 commented 1 year ago

Thanks for reaching out and providing the link to the relevant code in the "Additional Context" section @juls858 !

See below for a reproducible example ("reprex") of the thing you are reporting.

Basically, it looks like the dbt parser can recognize macros that are called, but not those that are set & passed as variables. Because the parsing phase doesn't assert that model_1 depends_on calver, then it isn't included as part of the state:modified selector. See depends_on assignments below for a comparison.

Without having confirmed, I'm assuming that parsing un-called macros was intentionally out of scope, so I'm going to recategorize this as as feature request.

### Reprex `macros/add_sql_comment.sql` ```sql {% macro add_sql_comment_1(get_version) %} -- version: {{ get_version() }}) {%- endmacro %} {% macro add_sql_comment_2(version) %} -- version: {{ version }}) {%- endmacro %} {% macro calver() %} {{ return("2023.10.01") }} {% endmacro %} {% macro semver() %} {{ return("1.2.3") }} {% endmacro %} ``` `models/my_model_1.sql` ```sql {{ add_sql_comment_1(calver) }} select 1 as id ``` `models/my_model_2.sql` ```sql {{ add_sql_comment_2(calver()) }} select 2 as id ``` `dbt_project.yml` ```yaml name: "my_project" version: "1.0.0" config-version: 2 profile: "sandcastle" clean-targets: - target - dbt_packages - logs models: my_project: +materialized: table ``` Run everything initially and see that both models are selected: ```shell dbt clean dbt run cp target/manifest.json . ``` After making a small change to each of the `calver` and `semver` macros (within `macros/add_sql_comment.sql`), run everything again and see that only the `my_model_2` model was re-run: ```shell dbt run -s state:modified --state . cp target/manifest.json . ```
### `depends_on` assignments The key portions of `target/manifest.json` that show the difference in the `depends_on` assignments that come from parsing: ```json "nodes": { "model.my_project.my_model_1": { "original_file_path": "models/my_model_1.sql", "language": "sql", "depends_on": { "macros": [ "macro.my_project.add_sql_comment_1" ], "nodes": [] } }, "model.my_project.my_model_2": { "original_file_path": "models/my_model_2.sql", "language": "sql", "depends_on": { "macros": [ "macro.my_project.calver", "macro.my_project.add_sql_comment_2" ], "nodes": [] } } } ```