dbt-labs / dbt-redshift

dbt-redshift contains all of the code enabling dbt to work with Amazon Redshift
https://getdbt.com
Apache License 2.0
101 stars 59 forks source link

Default test name causes compilation error with `--store-failures` flag #925

Open ClauPet opened 1 month ago

ClauPet commented 1 month ago

Is this a new bug in dbt-core?

Current Behavior

I have the following dbt model:

test.sql

 column_name | val
-------------+-----
 A           | 0.4
 B           | 0.4

test.yml

models:
  - name: test
    columns:
      - name: column_name
        tests:
          - accepted_values:
              values: ['A', 'B', 'C'] 

After creating the model, I run the test doing dbt test --select test --store_failures. An empty table named accepted_values_test_column_name__a__b__c is created in the dbt_test__audit schema of my Redshift database. The second time I run the exact same command I get a compilation error saying that there is not table accepted_values_test_column_name__A__B__C.

Redshift by default uses case insensitive identifiers, e.g. table names. I know I could get around this by adding a custom lower case test names. However, this is not desirable with the amount of accepted values tests I have.

Expected Behavior

I would expect dbt to know of the case insensitivity of Redshift identifiers and not try to crate a table accepted_values_test_column_name__A__B__C.

Steps To Reproduce

  1. Create an accepted values test with capital test values
  2. run dbt test --store-failures twice

Relevant log output

Compilation Error in test accepted_values_test_column_name__A__B__C (models/master_data/test.yml)
  When searching for a relation, dbt found an approximate match. Instead of guessing
  which relation to use, dbt will move on. Please delete "reddw"."dbt_test__audit"."accepted_values_test_column_name__a__b__c", or rename it to be less ambiguous.
  Searched for: "reddw"."dbt_test__audit"."accepted_values_test_column_name__A__B__C"
  Found: "reddw"."dbt_test__audit"."accepted_values_test_column_name__a__b__c"

Environment

- Python:3.10.2
- dbt-core:1.7.11
- dbt-redshift: 1.7.5

Which database adapter are you using with dbt?

redshift

Additional Context

No response

dbeatty10 commented 1 month ago

Thanks for reporting this @ClauPet !

I can't replicate this because we set enable_case_sensitive_identifier to true in our cluster’s parameter group via the Amazon Redshift console.

But I absolutely believe you are experiencing this because I've seen the unfortunate consequences of Redshift's default behavior of ignoring quoted identifiers first-hand before. See below for example code that I think would reproduce this when enable_case_sensitive_identifier=false (default).

The most direct way to overcome this to to set enable_case_sensitive_identifier to true, but I'm assuming that isn't an option for you.

In the meantime, I'm going to transfer this to the dbt-redshift repo because I don't think this affects any other dbt adapters.

Reprex

Create these files:

models/test.sql

select 'A' as column_name union all
select 'B' as column_name

models/test.yml

models:
  - name: test
    columns:
      - name: column_name
        tests:
          - accepted_values:
              values: ['A', 'B', 'C'] 

Run these commands:

dbt run --select test
dbt test --select test --store-failures
dbt test --select test --store-failures