Open timothyjang123 opened 2 years ago
Hi @timothyjang123, thanks for raising this issue! I have not worked with store-failures yet, so I have no experience with how dbt-expectations behaves there. I'll try to find some time to look into it, but if you have any insights or suggestions, I'd love to hear (or accept a PR!).
@clausherther thanks for tagging this! I don't have much expertise here either but have been meaning to dig into the core logic anyways - will post an update when available!
Agree with @timothyjang123 - ideally one would be able to store the outputs of the exceptions in the warehouse to then present the stakeholders with a list of IDs that are violating the validation rules from within metabase.
I think at this point the limitation is in how dbt names the instances of test runs. I'll try to carve out some time next weekend to see if shorter names etc change this behavior when storing, but not sure how to proceed here. Any ideas welcome!
A quick bit testing shows that this is also the case for the built-in tests if the test or table or column name is longer than some number of characters and the test accepts complex parameters.
For example, for the built-in accepted_values
I get this name for a table named data_test
:
accepted_values_data_test_date_col__1__2__3
- name: data_test
columns:
- name: date_col
tests:
- accepted_values:
values: ["1", "2", "3"]
But for a table named data_test_really_long_name_for_testing
, I get
accepted_values_data_test_real_40864182ace9cc3db0406d232fc083d0
- name: data_test_really_long_name_for_testing
columns:
- name: date_col
tests:
- accepted_values:
values: ["1", "2", "3"]
I'm not sure what exactly the cutoff is or what else might trigger parameters to get hashed into the name.
Looks like dbt 1.1.0
might help here.
You're going to be able to override the generated test name with your own (unique) instance name, e.g.
- name: data_test
tests:
- dbt_expectations.expect_compound_columns_to_be_unique:
name: my_unique_key
column_list: ["date_col", "col_string_b"]
which will then generate a test like this:
00:57:14 57 of 57 START test my_unique_key .............................................. [RUN]
00:57:14 57 of 57 PASS my_unique_key .................................................... [PASS in 0.03s]
(Tested on 1.1.0-rc1
)
Have not yet tested how this gets stored, but this is a good sign!
best setup it to store all the test results in single table. and compact data in to a json field. to get uniform result set.
There may be some configuration that takes care of this already, but I haven't been able to find a way to modify the way dbt-expectations outputs failures to match the way failures get outputted for generic/default dbt tests.
Eg. In snowflake, the outputted table is named as below and doesn't tell us much as to what it's storing:
dbt_expectations_source_expect_1a455e48615c27030f935e6977283170
When I'm not using a dbt-expectations test the stored failure looks like below (using a custom generic test as well):
unique_dim_facility_facility_key
I believe there was some work done regarding this on the dbt labs side: https://github.com/dbt-labs/dbt-core/issues/517
Any insight as to why test results are being outputted differently when we use this package?