calogica / dbt-expectations

Port(ish) of Great Expectations to dbt test macros
https://calogica.github.io/dbt-expectations/
Apache License 2.0
1.06k stars 130 forks source link

Undesirable behavior w/ --store-failures #142

Open timothyjang123 opened 2 years ago

timothyjang123 commented 2 years ago

There may be some configuration that takes care of this already, but I haven't been able to find a way to modify the way dbt-expectations outputs failures to match the way failures get outputted for generic/default dbt tests.

Eg. In snowflake, the outputted table is named as below and doesn't tell us much as to what it's storing: dbt_expectations_source_expect_1a455e48615c27030f935e6977283170

When I'm not using a dbt-expectations test the stored failure looks like below (using a custom generic test as well): unique_dim_facility_facility_key

I believe there was some work done regarding this on the dbt labs side: https://github.com/dbt-labs/dbt-core/issues/517

Any insight as to why test results are being outputted differently when we use this package?

clausherther commented 2 years ago

Hi @timothyjang123, thanks for raising this issue! I have not worked with store-failures yet, so I have no experience with how dbt-expectations behaves there. I'll try to find some time to look into it, but if you have any insights or suggestions, I'd love to hear (or accept a PR!).

timothyjang123 commented 2 years ago

@clausherther thanks for tagging this! I don't have much expertise here either but have been meaning to dig into the core logic anyways - will post an update when available!

rarup1 commented 2 years ago

Agree with @timothyjang123 - ideally one would be able to store the outputs of the exceptions in the warehouse to then present the stakeholders with a list of IDs that are violating the validation rules from within metabase.

clausherther commented 2 years ago

I think at this point the limitation is in how dbt names the instances of test runs. I'll try to carve out some time next weekend to see if shorter names etc change this behavior when storing, but not sure how to proceed here. Any ideas welcome!

clausherther commented 2 years ago

A quick bit testing shows that this is also the case for the built-in tests if the test or table or column name is longer than some number of characters and the test accepts complex parameters. For example, for the built-in accepted_values I get this name for a table named data_test:

accepted_values_data_test_date_col__1__2__3

  - name: data_test
    columns:
      - name: date_col
        tests:
          - accepted_values:
              values: ["1", "2", "3"]

But for a table named data_test_really_long_name_for_testing, I get
accepted_values_data_test_real_40864182ace9cc3db0406d232fc083d0

  - name: data_test_really_long_name_for_testing
    columns:
      - name: date_col
        tests:
          - accepted_values:
              values: ["1", "2", "3"]

I'm not sure what exactly the cutoff is or what else might trigger parameters to get hashed into the name.

clausherther commented 2 years ago

Looks like dbt 1.1.0 might help here.

You're going to be able to override the generated test name with your own (unique) instance name, e.g.

  - name: data_test
    tests:
        - dbt_expectations.expect_compound_columns_to_be_unique:
            name: my_unique_key
            column_list: ["date_col", "col_string_b"]

which will then generate a test like this:

00:57:14  57 of 57 START test my_unique_key .............................................. [RUN]
00:57:14  57 of 57 PASS my_unique_key .................................................... [PASS in 0.03s]

(Tested on 1.1.0-rc1)

Have not yet tested how this gets stored, but this is a good sign!

ismailsimsek commented 2 years ago

best setup it to store all the test results in single table. and compact data in to a json field. to get uniform result set.