dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.01k stars 1.63k forks source link

[Bug] Data tests do not quote columns names when mixed case #10477

Closed urkle closed 4 months ago

urkle commented 4 months ago

Is this a new bug in dbt-core?

Current Behavior

We have models that use mixed-case column names (and table names), however when defining column tests the generated SQL does not quote the column names thus the queries fail on postgres (and other DBs that are case sensitive)

select
    myColumn as unique_field,
    count(*) as n_records

from "my_db"."public"."myModel"
where myColumn is not null
group by myColumn
having count(*) > 1

Expected Behavior

DB Columns to be quoted so case is handled correctly.

select
    "myColumn" as unique_field,
    count(*) as n_records

from "my_db"."public"."myModel"
where "myColumn" is not null
group by "myColumn"
having count(*) > 1

Steps To Reproduce

Model definition yml

models:
  -name: myModel
    columns:
      - name: myColumn
         data_tests:
           - unique

Relevant log output

No response

Environment

- OS: macOS 12.5
- Python: python 3.11.9
- dbt: 1.8.4

Which database adapter are you using with dbt?

postgres

Additional Context

No response

urkle commented 4 months ago

I implemented a local fix by overriding the test macros to wrap column_name as adapter.quote(column_name)

urkle commented 4 months ago

final macro changes that work (though needs a new quote parameter added to the main test definition)

{% macro default__test_unique(model, column_name, quote = True) %}
{%- set column_quoted = adapter.quote(column_name) if quote else column_name -%}
select
    {{ column_quoted }} as unique_field,
    count(*) as n_records

from {{ model }}
where {{ column_quoted }} is not null
group by {{ column_quoted }}
having count(*) > 1

{% endmacro %}

This allows this to work

models:
  -name: myModel
    data_tests:
       - unique:
            quote: False
            column_name: |
               "myColumn" || ':' || "myOtherColumn"
    columns:
      - name: myColumn
         data_tests:
           - unique
jeremyyeo commented 4 months ago

Did you try the quote property on the column https://docs.getdbt.com/reference/resource-properties/quote?

dbeatty10 commented 4 months ago

This properly quoted for me when I used the quote property that @jeremyyeo mentioned:

models/my_model.sql

select null as "Id"

models/_models.yml

models:
  - name: my_model
    columns:
      - name: Id
        quote: true
        data_tests:
          - not_null

Here's the SQL in target/compiled/my_project/models/_models.yml/not_null_my_model__Id_.sql

select "Id"
from "db"."my_schema"."my_model"
where "Id" is null
urkle commented 4 months ago

@jeremyyeo i hunted all through the docs and source and could only find quote_columns (for seeds!).

Is there a way to turn that quote setting on for ALL columns in a model? (or even all models in a project?)

And I'll try that right now to see if it works.

urkle commented 4 months ago

@jeremyyeo that does indeed work and gets the columns quoted. It would be nice to be able to set it for the whole project.

(e.g. a new key here? https://docs.getdbt.com/reference/project-configs/quoting ) and/or some per-model setting

dbeatty10 commented 4 months ago

@jeremyyeo that does indeed work and gets the columns quoted.

🎉

It would be nice to be able to set it for the whole project. (e.g. a new key here? https://docs.getdbt.com/reference/project-configs/quoting ) and/or some per-model setting

Yeah! 🤩 We're imagining adding the following new config to dbt_project.yml (as described in https://github.com/dbt-labs/dbt-core/issues/2986):

quote:
  database: true|false    # or `project` on dbt-bigquery
  schema: true|false      # or `dataset` on dbt-bigquery
  identifier: true|false
  columns: true|false

Since the quote property allows for quoting columns in data tests and https://github.com/dbt-labs/dbt-core/issues/2986 proposes a way to apply similar quoting project-wide, I'm going to close this issue.

But please feel free to reach out if you have any other feedback or ideas.