calogica / dbt-expectations

Port(ish) of Great Expectations to dbt test macros
https://calogica.github.io/dbt-expectations/
Apache License 2.0
1.01k stars 123 forks source link

[Feature Request] Support for large deviation tests for partitioned bigquery tables #237

Open praveen-prashant opened 1 year ago

praveen-prashant commented 1 year ago

Is your feature request related to a problem? Please describe. We use weekly partitioned bigquery tables for our mart layer. We use dbt to create and update them. We often need to ensure that table row counts do not vary too much week by week, but there is no inbuilt test currently in dbt-expectations that allows to check this

Describe the solution you'd like A new test that checks table row count deviations of partitions over a threshold. Something like below that throws a warning if counts deviate by more than 20% for a table my_mart_table when compared to the directly preceding partitions

version: 2

models:
  - name: my_mart_table
    tests:
      - not_large_deviation_count:
          severity: warn
          threshold: 0.2
          partition_column: partition_date          

Describe alternatives you've considered We have implemented a custom test like below:

{% macro test_not_large_deviation_count(model, threshold, partition_column='partition_date',  row_condition='1=1') %}

WITH counts AS (
  SELECT
        {{ partition_column }} AS partition_date,
        COUNT(*) AS observation_count,
  FROM {{ model }}
  WHERE {{ row_condition }}
  GROUP BY 1
),

compare AS (
  SELECT
      partition_date,
      observation_count AS current_value,
      LEAD (observation_count) OVER (ORDER BY partition_date DESC) AS preceding_value
  FROM counts
)

SELECT
      *
FROM compare
WHERE ABS(SAFE_DIVIDE(current_value - preceding_value, preceding_value)) > {{ threshold }}

{% endmacro %}

Additional context It would be nice to have other column level comparison based tests to check for partition-on-partition deviations in values like max, min, avg etc for columns in a table just like row counts