calogica / dbt-expectations

Port(ish) of Great Expectations to dbt test macros
https://calogica.github.io/dbt-expectations/
Apache License 2.0
1.09k stars 141 forks source link

[Feature Request] `expect_table_row_count_to_equal` should return the actual row count as well as the expression #321

Open siljamardla opened 3 weeks ago

siljamardla commented 3 weeks ago

I want to monitor some row counts on my table. I'm planning to set up row count tests and enable storing failures. I plan to do this with multiple different conditions. I actually expect some of these two fail. The aim is to monitor how badly they fail. The idea is to visualise the outcome in a calendar heat map, to give me a visual on when the issues are big.

I could use the row count test, configured like this:

      - dbt_expectations.expect_table_row_count_to_equal:
          value: 0
          group_by: [created_date]
          row_condition: some_column='failed'
          config:
            severity: warn
            store_failures: true

However, the outcome is a true/false only.

Sample compiled code:

    with grouped_expression as (
    select
        created_date as col_1,
        count(*) = 0 as expression
    from 
      schema_name.table_name
    where
       some_column='failed'
    group by  created_date
),
validation_errors as (
    select *
    from
        grouped_expression
    where
        not(expression = true)
)
select *
from validation_errors
image

I wish the compiled code would have an additional column for the actual row_count, like this: Sample compiled code:

    with grouped_expression as (
    select
        created_date as col_1,
        count(*) as row_count,
        count(*) = 0 as expression
    from 
      schema_name.table_name
    where
       some_column='failed'
    group by  created_date
),
validation_errors as (
    select *
    from
        grouped_expression
    where
        not(expression = true)
)
select *
from validation_errors

to return

image

It's not a breaking change as the logic of the test remains the same. We're just making the stored output more informative.

I'm happy to contribute, but I could not immediately figure out where to make this change. The actual code compiling happens through so many macros that I'm not (yet) familiar with.

siljamardla commented 2 weeks ago

I did figure out where to do the change. I've made a specific change to this test (#323), also duplicating 2 generic macros to achieve my goal. Clearly this is not the way to solve the question in long term. I'm happy to work together to make it a cleaner and more uniform update, just don't want to go down a rabbit hole without any confirmation from maintainers' side :)