dbt-labs / dbt-postgres

Apache License 2.0
34 stars 14 forks source link

[Feature] dbt unit tests with postgres hstore #150

Open vincentclaes opened 2 months ago

vincentclaes commented 2 months ago

Is this a new bug in dbt-core?

Current Behavior

Whenever my source table has a postgres hstore column type, the unittests will fail with a message like

#  An error occurred during execution of unit test 'unzipped_runs_incremental_model'. There may be an error in the unit test definition: check the data types.
#   Database Error
#    syntax error at or near "USER"
#    LINE 54: ...amp with time zone) as completed_at, cast(null as USER-DEFIN...
#                                  

It has either USER or USER-DEFINED in it and i figured out it always points to an hstore column.

Expected Behavior

That my tests would run without an issue. Definitely if you do not specify the column in your tests. Or is there a way to mock the input table so we can omit the hstore column?

Steps To Reproduce

  1. Create postgres db with at least 1 hstore column
  2. Write a test (does not has to include the hstore column)
  3. Run the test

Relevant log output

#  An error occurred during execution of unit test 'unzipped_runs_incremental_model'. There may be an error in the unit test definition: check the data types.
#   Database Error
#    syntax error at or near "USER"
#    LINE 54: ...amp with time zone) as completed_at, cast(null as USER-DEFIN...
#

Environment

- OS: ubuntu 20.4
- Python: 3.20
- dbt: 1.8.6

Which database adapter are you using with dbt?

postgres

Additional Context

dbt unit test:

# I keep getting the same error
dbt-labs/dbt-core#15:21:59    Runtime Error in unit_test unzipped_runs_incremental_model (models/tests/tableau/preprocessed/unzipped_runs.yml)
#  An error occurred during execution of unit test 'unzipped_runs_incremental_model'. There may be an error in the unit test definition: check the data types.
##   Database Error
##    syntax error at or near "USER"
##    LINE 54: ...amp with time zone) as completed_at, cast(null as USER-DEFIN...
##                                                                  ^
dbt-labs/dbt-core#15:21:59
dbt-labs/dbt-core#15:21:59  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
# I am doing something wrong or there is a bug in dbt unit tests
# Commenting the tests for now and move on...

unit_tests:
  - name: unzipped_runs_incremental_model
    model: unzipped_runs
    overrides:
      macros:
        is_incremental: false
    given:
      - input: source("magic_star", "runs") # upstream dependent model of the current model
        rows:
          - {id: 1}
          - {id: 2}

      - input: this # contents of current model
        rows:
          - { pipeline_run_id: 1}
    expect:
      rows:
          - { pipeline_run_id: 2}

dbt model

{{
    macro_incremental_config(
        schema="preprocessed",
        unique_fields=["pipeline_run_id"],
        index_fields=["test_card_id"],
    )
}}

with
    runs as (
        select *
        from {{ source("magic_star", "runs") }}
        {% if is_incremental() %}
            where id > (select max(pipeline_run_id) from {{ this }})
        {% endif %}
    )
select
    id as "pipeline_run_id",
    study_id as "study_id",
    assay_id as "assay_id",
    test_card_id,
    processor_version,
    decorator_host as "decorator_host",
    started_at as "started_at",
    date(started_at) as "start_date",
    started_at::time as start_time,
    completed_at as "completed_at",
    date(completed_at) as "completed_date",
    completed_at::time as completed_time,
    metadata -> 's3-input-path' as "s3_input",
    metadata -> 's3-output-directory' as "s3_output",
    metadata -> 'consumables/detection-mix-id' as "detection_mix_id",
    metadata -> 'consumables/sample-id' as "sample_id",
    metadata -> 'created-by/reader-id' as "reader_id",
    metadata -> 'created-by/reader-type' as "reader_type",
    protocol_name as "protocol_name",
    protocol_version as "protocol_version"
from runs

it even fails with the same message if I simplify to

select id from {{ source("magic_star", "runs") }}
amychen1776 commented 2 months ago

Hi @vincentclaes , thank you for opening this up! It seems like at the moment, we only support data types that:

  1. comes from the information_schema.columns query (or equivalent), and
  2. cast(null as DATA_TYPE_NAME) works

So we are going to do two things related to this issue: