lack of database dialect difference support in test_expect_column_values_to_be_in_type_list

alison985 commented 2 years ago

Hi, this is esoteric, but I ran scheduling software for 2 years and time zones will forever be my nemesis.

From what I can tell, the dbt_expectations.expect_column_values_to_be_in_type_list test will cause loud and/or silent errors/warns in some circumstances. This issue write-up is limited to talking about timestamps. Yes, I admit I'm discussing a rarer use case.

The dbt_expectations.test_expect_column_values_to_be_in_type_list macro is used like this:

- name: issued_at
  tests:
    - dbt_expectations.expect_column_values_to_be_in_type_list:
        column_type_list: [timestamp]

It then uses the dbt Jinja function adapter.get_columns_in_relation(model) which refers to the dbt Class Column which will return the column.dtype which is the data type directly as specified and written in that database dialect. The dbt Class Column and the dbt Jinja adapter.get_columns_in_relation will return as expected and "correctly".

However, the dbt-expectations test skips database awareness when using the column_type_list value(s).

Postgres uses timestamp as timestamp without timezone by default, but can take a timestamp with time zone explicitly. postgres v 14 Time Zones **
BigQuery seems to treat timestamp as both timestamp with and without timezone without having to specify the difference. Also, dbt has BigQuery overrides for Column, so let's assume that's covered.
Snowflake supports timestamp, timestamp_ltz, timestamp_ntz, timestamp_tz. Notice how many times it refers to the "session" scope. :cry:
Redshift takes timestamp and timestamptz.**

This all equates to making the column_type_list parameter in (at least) this dbt-expectations test only work for one database dialect at a time. If I want to run the same dbt model file, with this test, on multiple database dialects(e.g. DRYness, source and target testing), especially when testing for "the right" specificity I will have problems. For instance, writing the test for Redshift with timestamptz would fail when run on Postgres.

** While not directly relevant to this issue write-up, please note:

dbt-expectations/macros/utils/postgres__type_timestamp is explicitly timestamp without time zone only.
dbt-expectations/macros/utils/snowflake__type_timestamp is explicitly timestamp_ntz.
Redshift specific type_timestamp differences do not exist.

clausherther commented 2 years ago

Hi @alison985! Sorry, late response! This is indeed a giant pain, as is really anything date/time related :/ How often do you need to run the same dbt-expectations code against different platforms? I ask b/c I've only run into this running our own integration tests against BQ, SF and postgres, and sort of accepted this pain for now. I think for the most part we've relied on dbt_utils.datatypes() in the past, which now is part of dbt-core. However, timestamp is still an unsolved issue in dbt-core. (see https://github.com/dbt-labs/dbt-core/issues/5521). So, for now I'm leaning towards waiting until cross-db data types are more stable in dbt-core until we invest in addressing this downstream here. WDYT?

clausherther commented 2 years ago

@alison985 I totally forgot that in our own integration testing (i.e. where we test all the tests against various adapters), we use this config:

column_type_list: [date, "{{ dbt_expectations.type_timestamp() }}"]

We only overwrite type_timestamp in dbt_expectations for now until dbt-core has this figured out, so in the future, you might just be able to do:

column_type_list: ["{{ type_timestamp() }}"]

clausherther commented 1 year ago

FYI opened this issue on dbt-core https://github.com/dbt-labs/dbt-core/issues/6453

calogica / dbt-expectations

lack of database dialect difference support in test_expect_column_values_to_be_in_type_list #192