dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.69k stars 1.61k forks source link

Models with upper case or mixed case names fail in consecutive runs #1557

Closed botchniaque closed 4 years ago

botchniaque commented 5 years ago

Issue

Issue description

When running (dbt run) the model with an uppercase name (or name containing upper case characters), the first run succeeds, but the consecutive run fails.

Results

I expected my models to either be executed successfully also when re-running them, or if using upper case model names is not supported, then fail on the first run.

What happened is the initial run succeeded, but the consecutive runs failed with an error When searching for a relation, dbt found an approximate match Instead of guessing which relation to use, dbt will move on.

System information

The output of dbt --version:

installed version: 0.13.1
   latest version: 0.13.1

Up to date!

The operating system you're running on:

macos 10.14.5, but it also failed in out kubernetes cluster running linux image

The python version you're using (probably the output of python --version) Python 2.7.10

Steps to reproduce

I created 2 files:

both with same contents:

SELECT 1

First run is successful:

dbt run -m UPPER_CASE_TEST mixed_CASE_test
Running with dbt=0.13.1
Found 350 models, 103 tests, 0 archives, 2 analyses, 218 macros, 8 operations, 2 seed files, 0 sources

07:34:39 | Concurrency: 1 threads (target='dev')
07:34:39 |
07:34:39 | 1 of 2 START view model sbochniak_base_ga.UPPER_CASE_TEST............ [RUN]
07:34:42 | 1 of 2 OK created view model sbochniak_base_ga.UPPER_CASE_TEST....... [CREATE VIEW in 2.61s]
07:34:42 | 2 of 2 START view model sbochniak_base_ga.mixed_CASE_test............ [RUN]
07:34:45 | 2 of 2 OK created view model sbochniak_base_ga.mixed_CASE_test....... [CREATE VIEW in 2.29s]
07:34:50 |
07:34:50 | Finished running 2 view models in 17.26s.

Completed successfully

Done. PASS=2 ERROR=0 SKIP=0 TOTAL=2

Consecutive run fails:

dbt run -m UPPER_CASE_TEST mixed_CASE_test
Running with dbt=0.13.1
Found 350 models, 103 tests, 0 archives, 2 analyses, 218 macros, 8 operations, 2 seed files, 0 sources

07:40:36 | Concurrency: 1 threads (target='dev')
07:40:36 |
07:40:36 | 1 of 2 START view model sbochniak_base_ga.UPPER_CASE_TEST............ [RUN]
07:40:37 | 1 of 2 ERROR creating view model sbochniak_base_ga.UPPER_CASE_TEST... [ERROR in 0.04s]
07:40:37 | 2 of 2 START view model sbochniak_base_ga.mixed_CASE_test............ [RUN]
07:40:37 | 2 of 2 ERROR creating view model sbochniak_base_ga.mixed_CASE_test... [ERROR in 0.02s]
07:40:40 |
07:40:40 | Finished running 2 view models in 9.79s.

Completed with 2 errors:

Compilation Error in model UPPER_CASE_TEST (models/base/ga/customer_funnels/UPPER_CASE_TEST.sql)
  When searching for a relation, dbt found an approximate match. Instead of guessing
  which relation to use, dbt will move on. Please delete "datalake"."sbochniak_base_ga"."upper_case_test", or rename it to be less ambiguous.
  Searched for: "datalake"."sbochniak_base_ga"."UPPER_CASE_TEST"
  Found: "datalake"."sbochniak_base_ga"."upper_case_test"

  > in macro materialization_view_default (macros/materializations/view/view.sql)
  > called by model UPPER_CASE_TEST (models/base/ga/customer_funnels/UPPER_CASE_TEST.sql)

Compilation Error in model mixed_CASE_test (models/base/ga/customer_funnels/mixed_CASE_test.sql)
  When searching for a relation, dbt found an approximate match. Instead of guessing
  which relation to use, dbt will move on. Please delete "datalake"."sbochniak_base_ga"."mixed_case_test", or rename it to be less ambiguous.
  Searched for: "datalake"."sbochniak_base_ga"."mixed_CASE_test"
  Found: "datalake"."sbochniak_base_ga"."mixed_case_test"

  > in macro materialization_view_default (macros/materializations/view/view.sql)
  > called by model mixed_CASE_test (models/base/ga/customer_funnels/mixed_CASE_test.sql)

Done. PASS=0 ERROR=2 SKIP=0 TOTAL=2
drewbanin commented 5 years ago

Thanks for the report @botchniaque! This sounds similar to https://github.com/fishtown-analytics/dbt/issues/1555 -- we'll take a look!

drewbanin commented 5 years ago

@botchniaque can you tell me:

  1. which database you're using and
  2. what the value of quoting: is in your dbt_project.yml file?

Any chance you changed your quoting config between invocations of dbt?

botchniaque commented 5 years ago

Will update next week. I am traveling currently.

On Wed, 19 Jun 2019, 19:24 Drew Banin, notifications@github.com wrote:

@botchniaque https://github.com/botchniaque can you tell me:

  1. which database you're using and
  2. what the value of quoting: is in your dbt_project.yml file?

Any chance you changed your quoting config between invocations of dbt?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fishtown-analytics/dbt/issues/1557?email_source=notifications&email_token=AAGXH2B6QOQ6ENDIXQ3THITP3JTVXA5CNFSM4HZGDDPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCSMDQ#issuecomment-503653902, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGXH2FHJASWO4Y5FEZLYFLP3JTVXANCNFSM4HZGDDPA .

botchniaque commented 5 years ago

To answer your questions:

Also, I tried to reproduce the problem in a freshly created project, also targeting Redshift, and the problem also occurs. Steps to reproduce:

$ dbt init uppercase-bug
$ cd uppercase-bug
$ echo SELECT 1 > models/UPPERCASE.sql
$ dbt run -m UPPERCASE # this run is successful
$ dbt run -m UPPERCASE # this run fails with the below error:

Running with dbt=0.13.1
Found 2 models, 0 tests, 0 archives, 0 analyses, 113 macros, 0 operations, 0 seed files, 0 sources

22:16:16 | Concurrency: 1 threads (target='dev')
22:16:16 |
22:16:16 | 1 of 1 START view model sbochniak.UPPERCASE.......................... [RUN]
22:16:16 | 1 of 1 ERROR creating view model sbochniak.UPPERCASE................. [ERROR in 0.02s]
22:16:16 |
22:16:16 | Finished running 1 view models in 4.73s.

Completed with 1 errors:

Compilation Error in model UPPERCASE (models/UPPERCASE.sql)
  When searching for a relation, dbt found an approximate match. Instead of guessing
  which relation to use, dbt will move on. Please delete "datalake"."sbochniak"."uppercase", or rename it to be less ambiguous.
  Searched for: "datalake"."sbochniak"."UPPERCASE"
  Found: "datalake"."sbochniak"."uppercase"

  > in macro materialization_view_default (macros/materializations/view/view.sql)
  > called by model UPPERCASE (models/UPPERCASE.sql)

Done. PASS=0 ERROR=1 SKIP=0 TOTAL=1
drewbanin commented 5 years ago

Thanks for the additional info @botchniaque. I was just able to reproduce this - it turns out that this particular bug only occurs on Redshift!

We'll check this out for sure, thanks again

drewbanin commented 5 years ago

Ok, quick followup. Redshift doesn't allow you to create upper-cased table names. This was news to us! Check this out:

create table public."MY_TABLE" (
  id int
);

select *
from information_schema.tables
where table_schema = 'public'
and table_name ilike 'my_table';

table_schema | table_name
---------------------------------
public       | my_table

Databases that are sane and good will preserve the casing of identifiers when they are wrapped in quotes. Conversely, Redshift discards the quotes and creates the relation with a lowercased identifier. This messes with dbt's understanding of the tables in your database, resulting in the error you're seeing here.

There are basically two ways to proceed:

  1. Prevent users from creating models with upper-cased aliases (returning a better error than what you're seeing here)
  2. Make dbt's duplicate relation logic smarter for Redshift, avoiding this particular error

Given the nature of how Redshift builds tables, you might be best off just lower-casing your model names to avoid this problem altogether. Or, if you want to keep your model names upper-cased, you can tell dbt to unquote your identifiers, which should fix this cache issue:

# dbt_project.yml
quoting:
  database: false
  schema: false
  identifier: false

Let me know what you think about all that!

botchniaque commented 5 years ago

This is not a big problem for us - we're using lowercase model names anyways. Just wanted to let you know about the weird behavior.

drewbanin commented 4 years ago

closing this - low incidence + low impact bug with an easy and sane fix

vergenzt commented 2 years ago

FYI, just found this issue and figured I'd share a relevant config: https://docs.aws.amazon.com/redshift/latest/dg/r_enable_case_sensitive_identifier.html

TL;DR:

In my situation, I'm trying to turn on enable_case_sensitive_identifer because some JSON-formatted staging data I've ingested and am trying to parse contains mixed case nested keys: https://docs.aws.amazon.com/redshift/latest/dg/super-configurations.html#upper-mixed-case

Random tip in case someone else encounters this: I was having trouble getting my model to work even though I was setting enable_case_sensitive_identifier TO true in a pre hook. It turned out that because I was materializing as a view, the user still had to set the config in their session or the model wouldn't work correctly for them.

My solution ended up being to materialize this model as a table. (I'm not using mixed case in any output identifiers; just need to parse them from input JSON blobs.)