calogica / dbt-expectations

Port(ish) of Great Expectations to dbt test macros
https://calogica.github.io/dbt-expectations/
Apache License 2.0
1.06k stars 128 forks source link

[BUG] DST shift issue for expect_row_values_to_have_data_for_every_n_datepart #236

Open ryanmeekins opened 1 year ago

ryanmeekins commented 1 year ago

Is this a new bug in dbt-expectations?

Current Behavior

The expect_row_values_to_have_data_for_every_n_datepart test when using date_part set to hour has test failures during the DST spring forward shifts, such as 2022-03-13T02:00:00. This isn't an actual hour for the America/New_York time zone. This hour is skipped as the time zone shifts, going from 2022-03-13T01:00:00-05:00 to 2022-03-13T03:00:00-04:00.

I suspect this is because the UTC offset isn't being used for timestamp_tz date columns.

Expected Behavior

This test shouldn't fail on DST shifts, as long as each hour when converted to UTC or a single offset is there.

Steps To Reproduce

  1. Create a new model, test_model as:
    select
    dateadd(hour, seq4(), '2022-01-01 05:00:00 +0000'::timestamp_tz) as timestamp_utc,
    convert_timezone('America/New_York', timestamp_utc) as timestamp_et
    from
    table
    (generator(rowcount => 8760))
  2. Add a test to this model using:
    - name: test_model
    tests:
      - dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
          date_col: timestamp_et
          date_part: hour
  3. Run and test the new model using dbt build -s test_model. The test should fail with 1 result. After inspecting, the row should be 2022-03-13T02:00:00.

Relevant log output

Environment

- OS: Windows 10
- Python: 3.9.13
- dbt: 1.3.1
- dbt-expectations: 0.8.2

Which database adapter are you using with dbt?

Snowflake: 1.3.0

Note: dbt-expectations currently does not support database adapters other than the ones listed below.

Additional Context

clausherther commented 1 year ago

Thanks for opening this issue. I have a feeling this belongs over in dbt-date since that's where the date math is actually being done, but try to I'll take a look next week. These sorts of things are typically hard to track down. In the meantime, you could use row_condition to filter out the offending hour.