calogica / dbt-expectations

Port(ish) of Great Expectations to dbt test macros
https://calogica.github.io/dbt-expectations/
Apache License 2.0
1.01k stars 123 forks source link

[Feature Request] Allow for percentage of rows to be null #238

Open rlh1994 opened 1 year ago

rlh1994 commented 1 year ago

Is your feature request related to a problem? Please describe. We have a column that most of the time should not be null, but we allow some tolerance in this due to the way the data is sourced, currently there is no test that allows for a proportion of the records in a column to be null, it's all or nothing.

Describe the solution you'd like A test (or option in an existing test) that calculated the proportion of (not) null records and compares it against some specified tolerance.

Describe alternatives you've considered Creating a custom test or not testing at all.

Additional context

danhphan commented 1 year ago

Hi @clausherther I'm happy to work on this feature.

clausherther commented 1 year ago

@danhphan that'd be amazing, thanks! 👏 Let me know if I can help with anything. I think we already have a couple of tests that implemented some sort of tolerance level.

danhphan commented 1 year ago

Yes, let me look into the code base and its tests in more details. Thank you!

marcellovictorino commented 1 year ago

This is an amazing feature! Any updates?

sambloom92 commented 1 year ago

@rlh1994 you can set tolerances for any test in terms of the absolute number of failing records:

- not_null:
  - config:
    - error_if: ">1000"
    - warn_if: ">500"

But it would be a nice enhancement if you could specify it as a proportion rather than an absolute number...

emishas commented 4 months ago

dbt-utils has this feature https://github.com/dbt-labs/dbt-utils/tree/1.1.1/#not_null_proportion-source

              - dbt_utils.not_null_proportion:
                  at_least: 0.99