Closed gofford closed 2 years ago
Hi @gofford! Thanks for raising this issue, those use cases makes sense to me! Would you be interested in taking a stab at a PR? Happy to help! I could see passing in a list of exception dates? A more complex but more robust solution would be to pass in a table of exception dates as model ref, but I think the list approach might cover a lot of ground to start. What do you think?
Sounds good @clausherther. I can definitely see the utility of an external ref
table for exclusions, particularly in the case where there are a large number of dates to be excluded. In the interim I think a good starting point would be to emulate the row_condition
logic to allow for flexible exclusion.
Locally, I have amended the macro to allow for tests of the form:
tests:
- dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
date_col: date_day
date_part: day
exclusion_condition: date_day not in ('2019-12-25', '2020-12-25', '2021-12-25')
# or equivalently
tests:
- dbt_expectations.expect_row_values_to_have_data_for_every_n_datepart:
date_col: date_day
date_part: day
exclusion_condition: not(date_part(month, date_day) = 12 and date_part(day, date_day) = 25)
exclusion_condition
(default: None) is then applied conditionally as part of a WHERE
clause on the final
CTE.
What do you think of this? This is working fine for my immediate (admittedly limited) use-case, and can certainly be improved moving forward. If you're happy with this as a starting point I can set up a PR.
@gofford I like the approach! Makes sense to allow users to configure this for their use case and platform. In theory, I think this could cover my example of an exclusion table, e.g.
exclusion_condition: date_day not in (select date_day from dates where is_holiday = true)
or even
exclusion_condition: date_day not in (select date_day from ref('dates') where is_holiday = true)
(this last one may need some refinement and testing)
Closed via #141
expect_row_values_to_have_data_for_every_n_datepart
is tremendously useful for validating whether every day has data.However, in some situations, there may be days where there is a data gap and I'm completely OK with it. For example:
The test currently assumes full continuity and will always fail in these scenarios. It would be useful to have a provision to set 'accepted gaps' whereby an error won't be raised if a provided date (or matching date) is missing.
Implementing this could be a relatively simple argument that passes to a conditional
where
clause in thefinal
query object, similar to howrow_condition
is applied tomodel_data