calogica / dbt-expectations

Port(ish) of Great Expectations to dbt test macros
https://calogica.github.io/dbt-expectations/
Apache License 2.0
989 stars 120 forks source link

[BUG] Invalid SQL compiled when extracting relation columns from duckdb adapter #288

Open qmg-tmay opened 8 months ago

qmg-tmay commented 8 months ago

Is this a new bug in dbt-expectations?

Current Behavior

Macros relying on relation column names build invalid SQL when using DuckDB adapter.

For example, using dbt_expectations.expect_table_columns_to_match_set causes the following error:

Runtime Error in test dbt_expectations_source_expect_table_columns_to_match_set_local_input_id__name__property_type (models/sources.yml)
  Parser Error: syntax error at or near ")"
  LINE 10:     ),
               ^

If I run the tests with the debug flag, I can see the compiled SQL looks like this:

select
      count(*) as failures,
      count(*) != 0 as should_warn,
      count(*) != 0 as should_error
    from (
      with relation_columns as (

    ),
    input_columns as (

        select cast('ID' as TEXT) as input_column
        union all

        select cast('NAME' as TEXT) as input_column
        union all

        select cast('PROPERTY_TYPE' as TEXT) as input_column

    )
    select *
    from
        relation_columns r
        full outer join
        input_columns i on r.relation_column = i.input_column
    where
        -- catch any column in input list that is not in the list of table columns
        -- or any table column that is not in the input list
        r.relation_column is null or
        i.input_column is null

    ) dbt_internal_test

The relation_columns CTE contains nothing leading to a syntax error.

I have experienced the same issue when using the dbt_expectations.expect_column_to_exist test.

Expected Behavior

The macro should compile to valid SQL.

Steps To Reproduce

My dbt_project.yml is:

name: 'validation_spike'
version: '1.0.0'
config-version: 2

profile: 'validation_spike'

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:
  - "target"
  - "dbt_packages"

models:
  validation_spike:
    # Config indicated by + and applies to all files under models/example/
    +materialized: view

vars:
  'dbt_date:time_zone': "Europe/London"

My profiles.yml file is:

validation_spike:
  target: dev
  outputs:
    dev:
      type: duckdb
      path: ":memory:"

My models/sources.yml file is:

version: 2

sources:
  - name: local
    meta:
      external_location: "~/<path>/validation_spike/data/{name}.csv"
    tables:
      - name: input
        description: Valid input data
        tests:
          - dbt_expectations.expect_table_columns_to_match_set:
              column_list: ["id", "name", "property_type"]
        columns:
          - name: id
            description: Customer id
            tests:
              - unique
              - not_null
          - name: name
            description: Customer name
            tests:
              - not_null
          - name: property_type
            description: Property type
            tests:
              - accepted_values:
                  name: invalid_property_type
                  values: ["bungalow", "house", "flat/maisonette"]

My dependencies.yml file is:

packages:
  - package: calogica/dbt_expectations
    version: 0.10.0

My test file is (input.csv):

id,name,property_type
1,Sam,bungalow
2,Josh,flat/maisonette
3,Toby,house

Relevant log output

Runtime Error in test dbt_expectations_source_expect_table_columns_to_match_set_local_input_id__name__property_type (models/sources.yml)
  Parser Error: syntax error at or near ")"
  LINE 10:     ),
               ^

Environment

- OS: MacOs Monterey 12.7
- Python: 3.11.0
- dbt: 1.6.6 (dbt-duckdb 1.6.1)
- dbt-expectations: 0.10.0

Which database adapter are you using with dbt?

dbt-duckdb

Note: dbt-expectations currently does not support database adapters other than the ones listed below.

Additional Context