Although all values are distinct in that model column (i.e. all column values are "the most common"), any value in value_set different from 'b' (the first result naturally ordered) will fail.
Relevant log output
with value_counts as (
select
col_string_b as value_field,
count(*) as value_count
from dbt_expectations_integration_tests.data_test
group by col_string_b
),
value_counts_ranked as (
select
*,
row_number() over(order by value_count desc) as value_count_rank
from
value_counts
),
value_count_top_n as (
select
value_field
from
value_counts_ranked
where
value_count_rank = 1
),
set_values as (
select
'ab' as value_field
),
unique_set_values as (
select distinct value_field
from
set_values
),
validation_errors as (
-- values from the model that are not in the set
select
value_field
from
value_count_top_n
where
value_field not in (select value_field from unique_set_values)
)
select *
from validation_errors
Note: dbt-expectations currently does not support database adapters other than the ones listed below.
Postgres
Snowflake
BigQuery
Additional Context
Great expectations has an additional ties_okay argument for partial matches. We could also add support for that.
So following tests would succeed
# Expect error if not all most common values are in the set
- dbt_expectations.expect_column_most_common_value_to_be_in_set:
value_set: ['b']
top_n: 1
config:
error_if: "=0"
warn_if: "<3"
# Expect success if not all most common values are in the set but ties_okay is set
- dbt_expectations.expect_column_most_common_value_to_be_in_set:
value_set: ['b']
top_n: 1
ties_okay: true
# Expect error if none of the most common values are in the set and ties_okay is set
- dbt_expectations.expect_column_most_common_value_to_be_in_set:
value_set: ['invalid_value']
top_n: 1
ties_okay: true
config:
error_if: "=0"
warn_if: "<4"
# Expect success if not all most common values are in the set but ties_okay is set
# and the set contains extra values
- dbt_expectations.expect_column_most_common_value_to_be_in_set:
value_set: ['b', 'invalid_value']
top_n: 1
ties_okay: true
# Expect success if not all most common values are in the set but ties_okay is set
# and value is not first one of the column naturally ordered
- dbt_expectations.expect_column_most_common_value_to_be_in_set:
value_set: ['ab']
top_n: 1
ties_okay: true
# Expect success if all most common values are in the set
- dbt_expectations.expect_column_most_common_value_to_be_in_set:
value_set: ['b', 'ab', 'abc', 'abcd']
top_n: 1
# Expect success if all most common values are in the set
# and the set contains extra values
- dbt_expectations.expect_column_most_common_value_to_be_in_set:
value_set: ['b', 'ab', 'abc', 'abcd', 'invalid_value']
top_n: 1
# Expect error if none of the most common values are in the set
# and the set contains extra values
- dbt_expectations.expect_column_most_common_value_to_be_in_set:
value_set: ['invalid_value1', 'invalid_value2', 'invalid_value3', 'invalid_value4', 'invalid_value5']
top_n: 1
config:
error_if: "=0"
warn_if: "<4"
Is this a new bug in dbt-expectations?
Current Behavior
When most common values are non-unique, running expect_column_most_common_value_to_be_in_set only works for 1 of the most common values.
Expected Behavior
expect_column_most_common_value_to_be_in_set should match all most common values
Steps To Reproduce
in dbt-expectations/integration_tests/models/schema_tests/schema.yml, add following column test for model data_test column col_string_b.:
Although all values are distinct in that model column (i.e. all column values are "the most common"), any value in value_set different from 'b' (the first result naturally ordered) will fail.
Relevant log output
Environment
Which database adapter are you using with dbt?
Postgres
Note: dbt-expectations currently does not support database adapters other than the ones listed below.
Additional Context
Great expectations has an additional ties_okay argument for partial matches. We could also add support for that.
So following tests would succeed