dbt-labs / dbt-adapters

Apache License 2.0
26 stars 38 forks source link

[CT-2819] [Bug] default__alter_relation_add_remove_columns macro does not use quoting with case sensitive Snowflake relation #250

Open crystalro0 opened 1 year ago

crystalro0 commented 1 year ago

Is this a new bug in dbt-core?

Current Behavior

While working with DBT incremental config: on_schema_change='append_new_columns' The append new columns flag is not able to capture the correct case-sensitive column name and add it to the incremental table causing the run to fail.

Expected Behavior

Be able to apply to add the new columns to an incremental table with the 'append_new_columns' config.

Steps To Reproduce

  1. Environment should have an existing incremental materialization while on_schema_change='append_new_columns'.
  2. Add a case-sensitive column name like "Schema_Test" in the source table and rerun the incremental table. The error will be reproduced.

Relevant log output

invalid identifier '"Schema_Test"'

Environment

- OS:
- Python:
- dbt: 1.5

Which database adapter are you using with dbt?

snowflake

Additional Context

suggested the following as a workaround:

     {% for column in add_columns %}
        add column "{{ column.name }}" {{ column.data_type }}{{ ',' if not loop.last }}
      {% endfor %}{{ ',' if remove_columns | length > 0 }}

     {% for column in remove_columns %}
       drop column "{{ column.name }}" {{ ',' if not loop.last }}
     {% endfor %}

saved in project macro folder.

dbeatty10 commented 1 year ago

Thanks for reporting this @crystalro0 🤩

Adding an override to alter_relation_add_remove_columns like you mentioned worked for me as well.

### Reprex `models/src_artists.sql` ```sql {{ config( materialized='table', ) }} {% if var("version", 0) == 0 %} select {{ dbt.current_timestamp() }} as inserted_at, 'taylor' as name {% else %} -- add a non-zero version to the end of the command to get a different version: -- --vars "{'version': 1}" select {{ dbt.current_timestamp() }} as inserted_at, 'taylor' as Name, 'eras' as "Tour" {% endif %} ``` `models/dim_artists.sql` ```sql {{ config( materialized='incremental', on_schema_change='append_new_columns', ) }} select * from {{ ref("src_artists") }} ``` Add a case-sensitive column to an incremental model and look at the results afterwards ```shell dbt run --full-refresh dbt show --inline "select * from {{ ref('dim_artists') }}" dbt run --vars "{'version': 1}" dbt show --inline "select * from {{ ref('dim_artists') }}" ```
### Root cause Ultimately, I believe the root cause is that the current implementation does not match the quoting being applied in this trace of code (listed in reverse order): - https://github.com/dbt-labs/dbt-core/blob/5310d3715cdaa109eea03995731425a85cd6cd33/core/dbt/include/global_project/macros/materializations/models/incremental/column_helpers.sql#L11 - https://github.com/dbt-labs/dbt-core/blob/5310d3715cdaa109eea03995731425a85cd6cd33/core/dbt/include/global_project/macros/materializations/models/incremental/strategies.sql#L71 - https://github.com/dbt-labs/dbt-core/blob/5310d3715cdaa109eea03995731425a85cd6cd33/core/dbt/include/global_project/macros/materializations/models/incremental/strategies.sql#L10

Solution

If we update the implementation of default__alter_relation_add_remove_columns, I'd suggest using adapter.quote like elsewhere in that file to achieve the same effect:

            {% for column in add_columns %}
                add column {{ adapter.quote(column.name) }} {{ column.data_type }}{{ ',' if not loop.last }}
            {% endfor %}{{ ',' if remove_columns | length > 0 }}

            {% for column in remove_columns %}
            drop column {{ adapter.quote(column.name) }} {{ ',' if not loop.last }}
            {% endfor %}

Acceptance criteria

alexnikitchuk commented 11 months ago

we face the same issue with postgres adapter. Any chance it will be fixed?

colin-rogers-dbt commented 4 months ago

let's also validate against dbt-postgres

jeremyyeo commented 2 months ago

Same deal on BQ even if you set the quote config on the newly added column:

-- models/stg_customers.sql
{{
    config(
        materialized='incremental',
        unique_key='id',
        on_schema_change='sync_all_columns'
    )
}}

select 1 id, 'alice' as first_name
# models/schema.yml
models:
  - name: stg_customers
    columns:
      - name: id
      - name: first_name

Do a first build without issues.

$ dbt --debug build --full-refresh
00:17:25  On model.my_dbt_project.stg_customers: /* {"app": "dbt", "dbt_version": "1.8.5", "profile_name": "all", "target_name": "bq", "node_id": "model.my_dbt_project.stg_customers"} */

    create or replace table `cse-sandbox-319708`.`dbt_jyeo`.`stg_customers`

    OPTIONS()
    as (

select 1 id, 'alice' as first_name
    );

00:17:26  BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:49d34f81-2697-4f4e-9928-d6bbf353f337&page=queryresults
00:17:29  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '1d114b72-c90f-42be-9dfc-173e7831f4c2', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x12bae6b50>]}
00:17:29  1 of 1 OK created sql incremental model dbt_jyeo.stg_customers ................. [CREATE TABLE (1.0 rows, 0 processed) in 7.09s]

Add a new column but using a reserved SQL keyword:

-- models/stg_customers.sql
{{
    config(
        materialized='incremental',
        unique_key='id',
        on_schema_change='sync_all_columns'
    )
}}

select 1 id, 'alice' as first_name, 'premium' as `group`
# models/schema.yml
models:
  - name: stg_customers
    columns:
      - name: id
      - name: first_name
      - name: group
        quote: true
$ dbt --debug build
00:18:50  On model.my_dbt_project.stg_customers: /* {"app": "dbt", "dbt_version": "1.8.5", "profile_name": "all", "target_name": "bq", "node_id": "model.my_dbt_project.stg_customers"} */

    create or replace table `cse-sandbox-319708`.`dbt_jyeo`.`stg_customers__dbt_tmp`

    OPTIONS(
      expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour)
    )
    as (

select 1 id, 'alice' as first_name, 'premium' as `group`
    );

00:18:53  BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:5fdf3b2d-82e2-46f6-8ae7-ece16515cb4a&page=queryresults
00:18:59  
    In `cse-sandbox-319708`.`dbt_jyeo`.`stg_customers`:
        Schema changed: True
        Source columns not in target: [<BigQueryColumn group (STRING, NULLABLE)>]
        Target columns not in source: []
        New column types: []

00:18:59  On model.my_dbt_project.stg_customers: /* {"app": "dbt", "dbt_version": "1.8.5", "profile_name": "all", "target_name": "bq", "node_id": "model.my_dbt_project.stg_customers"} */

    alter table `cse-sandbox-319708`.`dbt_jyeo`.`stg_customers`
               add column group STRING

00:19:00  BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:28b881d0-a878-452b-9ba1-3f9bbc404e67&page=queryresults
00:19:00  BigQuery adapter: Retry attempt 1 of 1 after error: BadRequest('Syntax error: Unexpected keyword GROUP at [6:27]; reason: invalidQuery, location: query, message: Syntax error: Unexpected keyword GROUP at [6:27]')
00:19:01  BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:2c470b5b-5562-4550-ad66-9e473e1725ee&page=queryresults
00:19:01  BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:2c470b5b-5562-4550-ad66-9e473e1725ee&page=queryresults
00:19:01  Database Error in model stg_customers (models/stg_customers.sql)
  Syntax error: Unexpected keyword GROUP at [6:27]
00:19:01  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '9c085ee1-adb6-4667-9365-7e77f2c75256', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x149853710>]}
00:19:01  1 of 1 ERROR creating sql incremental model dbt_jyeo.stg_customers ............. [ERROR in 11.46s]

^ In this scenario, the user has even gone the extra mile to set the quote config on the column - but similar not respected when being added to the existing table.


This issue should be renamed - something like:

Across all adapters, dbt does not quote column names when being added to an existing incremental model via on_schema_change