dbt-labs / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
https://dbt-athena.github.io
Apache License 2.0
228 stars 100 forks source link

Dynamic `config.external_location` setting does not play well with `state:modified` #145

Open rumbin opened 1 year ago

rumbin commented 1 year ago

Summary

The external_location config setting is respected during state comparison when running dbt with --state ... -s state:modified option. This is troublesome in CI scenarios where we compare a development branch to the manifest.json of the production target. As a consequence, dbt considers all models modified which make use of the external_location setting in a target-specific manner.

Expected behavior

The model should not be considered as being changed, if only the dynamically set external_location differs.

We use many other target-specific settings, like, e.g. custom schemas or databases and all of that play nicely with state:modified. So, the observed behavior is unexpected in my eyes.

How to reproduce

Then this model will be shown as modified.

Use case

For our PII data we make use of the external_location setting in order to ensure that this data is written to a different S3 bucket, so we can make use of a bucket-specific access control policy.

A typical model config then looks as follows:

{{
    config(
        materialized='table'
        , external_location=get_pii_external_location_path()
        , tags=['daily', 'pii']
    )
}}

And the corresponding get_pii_external_location_path() is defined as:

{%- macro get_pii_external_location_path() -%}
    {#
        This macro is used for overriding the `external_location` config setting
        of PII models.
        Each model in the PII directory/schema must make use of it, in order
        to ensure that the model's data will get written to the PII S3 bucket.
    -#}

    {{ 's3://aws-athena-tables-data-pii/dbt/' ~ target.name ~ '/' ~ env_var('DBT_SCHEMA_NAME', this.schema) ~ '/' ~ this.name ~ '/' ~ run_started_at.isoformat() ~ '/' ~ invocation_id }}

{%- endmacro -%}

So we always use the same designated PII bucket but tweak the paths depending on the target.name and an environment variable DBT_SCHEMA_NAME, so concurrent CI runs don't clash.

If we now compile the project for --target prod and then compare what would be run in --target dev, we get all the PII models detected as changed:

dbt ls --state ./target -s state:modified --target dev
Jrmyy commented 1 year ago

πŸ‘‹πŸ» Hello

Thanks for addressing this issue ! πŸ™‚

According to dbt docs, the state modified takes into account all the types of state change and therefore the config change. Unfortunately, as stated by dbt in the caveats of the state:modified models selector, it does not play well with variables. This would have to be checked with dbt-core but I assume the same caveat applies to target (since this is behind the scene a variable).

Therefore it seems to be the expected behavior