dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.64k stars 1.6k forks source link

[CT-2695] [Bug] Discrepancy in dbt build vs dbt run for generate_schema_name macro #7863

Closed thebbennett closed 1 year ago

thebbennett commented 1 year ago

Is this a new bug in dbt-core?

Current Behavior

My team runs dbt Core.

We changed our CI job from

dbt seed
dbt test
dbt run {{stuff}}

to

dbt build {{stuff}}

We also have 2 macros, generate_schema_name and generate_alias_name, that override the create_schema function to instead create all of the views/tables produced by the CI run into 1 schema.

These macros work when we use dbt run. These macros fail when we use dbt build. The error message we get is permission denied for database dev and the logs tell me that dbt is trying to create a new schema for our CI run. This tells me that our macro is not working as intended.

Expected Behavior

There should be no difference in running dbt run versus dbt build when we run our CI job. Our generate_schema_name macro should work as expected and tell the CI job to place all views/tables in our set schema.

Steps To Reproduce

  1. Running dbt 1.4.1
  2. Create a macro called generate_relation_names and copy the following Jinja into it. Set YOUR_SCHEMA_HERE to a schema you have permission to create in.
{% macro generate_schema_name(custom_schema_name, node) -%}
    {#- /*
    Limit users to only one schema (their target schema) by throwing an error
    if they use the `schema` config
    */ -#}
    {% if custom_schema_name and node.resource_type=='model' %}
        {% set error_msg -%}
Cannot use custom schemas in this dbt project.
Remove the schema config from {{ node.resource_type }} "{{ node.name }}" to continue
        {% endset %}
        {{ exceptions.raise_compiler_error(error_msg) }}
    {% endif %}

    {#- /*
    Manually hardcode this since dbt only has permission to create in one schema for CI runs
    */ -#}
    {% set ci_schema='YOUR_SCHEMA_HERE' %}

    {%- if target.name == 'ci' %}

        {{ ci_schema }}

    {%- else -%}

        {{ target.schema }}

    {%- endif -%}

{%- endmacro %}

{% macro generate_alias_name(custom_alias_name, node) -%}

    {%- if target.name == 'ci' %}

        {{ target.schema }}_{{ custom_alias_name or node.name | trim }}

    {%- else -%}

        {{ custom_alias_name or node.name | trim }}

    {%- endif -%}

{%- endmacro %}
  1. Set up a CI job in dbt Cloud. Set target name to 'ci'
  2. Trigger the CI run
  3. Observe that CI run produced stables in the expected schema

Relevant log output

2023-06-13 15:47:04.192577 (ThreadPoolExecutor-0_0): 15:47:04  On list_dev: /* {"app": "dbt", "dbt_version": "1.4.6", "profile_name": "user", "target_name": "ci", "connection_name": "list_dev"} */

    select distinct nspname from pg_namespace

2023-06-13 15:47:04.193454 (ThreadPoolExecutor-0_1): 15:47:04  On list_dev: /* {"app": "dbt", "dbt_version": "1.4.6", "profile_name": "user", "target_name": "ci", "connection_name": "list_dev"} */

    select distinct nspname from pg_namespace

2023-06-13 15:47:04.194115 (ThreadPoolExecutor-0_0): 15:47:04  Opening a new connection, currently in state init
2023-06-13 15:47:04.195191 (ThreadPoolExecutor-0_1): 15:47:04  Opening a new connection, currently in state init
2023-06-13 15:47:04.195856 (ThreadPoolExecutor-0_0): 15:47:04  Redshift adapter: Connecting to Redshift using 'database' credentials
2023-06-13 15:47:04.196780 (ThreadPoolExecutor-0_1): 15:47:04  Redshift adapter: Connecting to Redshift using 'database' credentials
2023-06-13 15:47:04.396454 (ThreadPoolExecutor-0_1): 15:47:04  SQL status: SELECT in 0 seconds
2023-06-13 15:47:04.410095 (ThreadPoolExecutor-0_1): 15:47:04  On list_dev: Close
2023-06-13 15:47:04.430031 (ThreadPoolExecutor-0_0): 15:47:04  SQL status: SELECT in 0 seconds
2023-06-13 15:47:04.442494 (ThreadPoolExecutor-0_0): 15:47:04  On list_dev: Close
2023-06-13 15:47:04.444618 (ThreadPoolExecutor-0_1): 15:47:04  Acquiring new redshift connection 'create_dev_dbt_cloud_pr_82935_116'
2023-06-13 15:47:04.445567 (ThreadPoolExecutor-0_1): 15:47:04  Acquiring new redshift connection 'create_dev_dbt_cloud_pr_82935_116'
2023-06-13 15:47:04.446217 (ThreadPoolExecutor-0_1): 15:47:04  Creating schema "ReferenceKeyMsg(database='dev', schema='dbt_cloud_pr_82935_116', identifier=None)"
2023-06-13 15:47:04.455013 (ThreadPoolExecutor-0_1): 15:47:04  Using redshift connection "create_dev_dbt_cloud_pr_82935_116"
2023-06-13 15:47:04.455669 (ThreadPoolExecutor-0_1): 15:47:04  On create_dev_dbt_cloud_pr_82935_116: BEGIN
2023-06-13 15:47:04.456242 (ThreadPoolExecutor-0_1): 15:47:04  Opening a new connection, currently in state closed
2023-06-13 15:47:04.456792 (ThreadPoolExecutor-0_1): 15:47:04  Redshift adapter: Connecting to Redshift using 'database' credentials
2023-06-13 15:47:04.520288 (ThreadPoolExecutor-0_1): 15:47:04  SQL status: BEGIN in 0 seconds
2023-06-13 15:47:04.520964 (ThreadPoolExecutor-0_1): 15:47:04  Using redshift connection "create_dev_dbt_cloud_pr_82935_116"
2023-06-13 15:47:04.521516 (ThreadPoolExecutor-0_1): 15:47:04  On create_dev_dbt_cloud_pr_82935_116: /* {"app": "dbt", "dbt_version": "1.4.6", "profile_name": "user", "target_name": "ci", "connection_name": "create_dev_dbt_cloud_pr_82935_116"} */
create schema if not exists "dbt_cloud_pr_82935_116"
2023-06-13 15:47:06.366750 (ThreadPoolExecutor-0_1): 15:47:06  Postgres adapter: Postgres error: permission denied for database dev

2023-06-13 15:47:06.367584 (ThreadPoolExecutor-0_1): 15:47:06  On create_dev_dbt_cloud_pr_82935_116: ROLLBACK
2023-06-13 15:47:06.369775 (ThreadPoolExecutor-0_1): 15:47:06  Postgres adapter: Error running SQL: macro create_schema
2023-06-13 15:47:06.370370 (ThreadPoolExecutor-0_1): 15:47:06  Postgres adapter: Rolling back transaction.
2023-06-13 15:47:06.370975 (ThreadPoolExecutor-0_1): 15:47:06  On create_dev_dbt_cloud_pr_82935_116: Close
2023-06-13 15:47:06.372992 (MainThread): 15:47:06  Connection 'master' was properly closed.
2023-06-13 15:47:06.373584 (MainThread): 15:47:06  Connection 'list_dev' was properly closed.
2023-06-13 15:47:06.374000 (MainThread): 15:47:06  Connection 'create_dev_dbt_cloud_pr_82935_116' was properly closed.
2023-06-13 15:47:06.374535 (MainThread): 15:47:06
2023-06-13 15:47:06.374535 (MainThread): 15:47:06  Finished running  in 0 hours 0 minutes and 2.22 seconds (2.22s).
2023-06-13 15:47:06.375238 (MainThread): 15:47:06  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f1acb024c10>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f1b33558490>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f1b33558b50>]}
2023-06-13 15:47:06.375844 (MainThread): 15:47:06  Flushing usage events
2023-06-13 15:47:06.403559 (MainThread): 15:47:06  Encountered an error:
Database Error
  permission denied for database dev

Environment

- OS: dbt Cloud
- Python:
- dbt: 1.4.1

Which database adapter are you using with dbt?

redshift

Additional Context

See this Slack message

dbeatty10 commented 1 year ago

Thanks for reaching out @thebbennett !

I don't know all the ins-and-outs here, but these two links look like they have some relevant information:

It looks like dbt Cloud creates special schemas (like dbt_cloud_pr_82935_116) when you have CI configured in dbt Cloud. But I don't know why it is not created when you use dbt run but it is created when you do dbt build :shrug:

Could you reach out to dbt Cloud Support using the speech bubble (💬) in the dbt Cloud interface or at email address support@getdbt.com?

I hate redirecting you, but I don't know how to effectively troubleshoot this one since it appears to be related to dbt Cloud rather than dbt Core. In the meantime, I'm going to close this in favor of you reaching out to dbt Cloud Support. If there ends up being a change we can make in dbt Core, then we can either re-open this issue or create a new one.

patkearns10 commented 1 year ago

This has been resolved on a call with @thebbennett ! dbt build runs snapshot, which their previous run and test logic was leaving out. the snapshot sql also had specific schema logic written in the model, which did not enforce their preferred ci schema. The in-model schema logic takes priority over their generate_schema_name macro, so it was attempting to create the dbt_cloud_pr schema and failing due to the permissions.

nsoria1 commented 9 months ago

This has been resolved on a call with @thebbennett ! dbt build runs snapshot, which their previous run and test logic was leaving out. the snapshot sql also had specific schema logic written in the model, which did not enforce their preferred ci schema. The in-model schema logic takes priority over their generate_schema_name macro, so it was attempting to create the dbt_cloud_pr schema and failing due to the permissions.

How it was solved? Is it possible to override the schema creation and use the one in dev or prod profile?

patkearns10 commented 9 months ago

How it was solved?

Hi @nsoria1, I would say this was more about aligning expectations.

Is it possible to override the schema creation and use the one in dev or prod profile?

You might consider something like this:

{% snapshot your_snapshot %}

{{
    config(
      target_database='analytics',
      target_schema=target.schema
        ...
    )
}}
...

Whatever you do, just make sure that you are not overwriting your production schema in development runs.