group_by_columns does not allow you to group by higher aggregates. For example, if I have a timestamp column, I might want to group by date_trunc('week', timestamp) to monitor that values are expected each week.
select
count(*) as failures,
count(*) != 0 as should_warn,
count(*) != 0 as should_error
from (
with validation as (
select
date_trunc('week' ,timestamp),
sum(case when visit_reasons is null then 0 else 1 end) / cast(count(*) as numeric) as not_null_proportion
from <my_table>
group by date_trunc('week',timestamp)
),
validation_errors as (
select
date_trunc('week' ,timestamp),
not_null_proportion
from validation
where not_null_proportion < 0.95 or not_null_proportion > 1
)
select
*
from validation_errors
) dbt_internal_test
which, as you can see, in the validation_errors cte does not work. That said, if we just aliased the column in the validation cte and referenced it in validation_errors, we'd be all set. My suggested change in macros/generic_tests/not_null_proportion.sql:
with validation as (
select
{% for c in select_gb_cols %}
{{select_gb_cols}} as group_col_{{loop.counter}}{% if not loop.last %},{%endif%}
{% endear %}
sum(case when {{ column_name }} is null then 0 else 1 end) / cast(count(*) as {{ dbt.type_numeric() }}) as not_null_proportion
from {{ model }}
{{groupby_gb_cols}}
),
validation_errors as (
select
{% for c in select_gb_cols %}
group_col_{{loop.counter}}{% if not loop.last %},{%endif%}
{% endear %}
not_null_proportion
from validation
where not_null_proportion < {{ at_least }} or not_null_proportion > {{ at_most }}
Describe alternatives you've considered
I could make my own custom test to do this
Additional context
Don't think database will impact anything here
Who will this benefit?
anybody trying to group by time
Are you interested in contributing this feature?
I hope the above is enough to get started
Let us know if you want to contribute the feature, and whether would need a hand getting started
My team and I would need this feature to assess data quality on a day-by-day basis for data that we receive every hour thanks to a DATE_TRUNC('day', received_date) !
Describe the feature
group_by_columns does not allow you to group by higher aggregates. For example, if I have a timestamp column, I might want to group by date_trunc('week', timestamp) to monitor that values are expected each week.
Right now, if my yml is set to:
The code compiles to:
which, as you can see, in the validation_errors cte does not work. That said, if we just aliased the column in the validation cte and referenced it in validation_errors, we'd be all set. My suggested change in
macros/generic_tests/not_null_proportion.sql
:Describe alternatives you've considered
I could make my own custom test to do this
Additional context
Don't think database will impact anything here
Who will this benefit?
anybody trying to group by time
Are you interested in contributing this feature?
I hope the above is enough to get started
Let us know if you want to contribute the feature, and whether would need a hand getting started