calogica / dbt-expectations

Port(ish) of Great Expectations to dbt test macros
https://calogica.github.io/dbt-expectations/
Apache License 2.0
1.04k stars 126 forks source link

Schema-level tests #197

Open wasilaq opened 1 year ago

wasilaq commented 1 year ago

I originally posted this as a discussion in dbt-utils but it was suggested I move it here!

dbt tests currently tend to be at the model or column level. This discussion poses the question of whether it would be useful to introduce testing at the schema level as well (e.g. test that all models in the same schema have a particular characteristic). I think it would be interesting to add certain schema tests to this package but wanted to get others' thoughts first before adding things.

The way to do this is by querying the information schema in Snowflake. Below is an example of a test I created for a project to check the marts schema for views:

-- There should be no views in the marts schemas.
-- tests/check_for_marts.sql

select * from {{ target.database }}.information_schema.tables
where table_schema = '{{ (target.schema | upper) + '_MARTS' }}'
and table_type != 'BASE TABLE'

Benefits

The main benefit here is performance and ease of use. Instead of applying a test to each individual model, only one test needs to be configured. In the above example, only one query is run against the data warehouse.

Considerations

The above example isn't customizable (i.e. what's the schema name?). The code can be copy-pasted and adjusted accordingly, but it's not in a friendly format for package inclusion. If this type of test is something other folks are interested in, I'd love to hear ideas on how we can make schema tests more dynamic.

samLozier commented 1 year ago

I've been thinking about a similar concept for the last few months. I've wanted to check (ideally without actually executing dbt to do it) what dbt's output will be. my strategy has been to try to intercept the compiled sql just before it gets sent over the wire to snowflake and then parse that sql with another tool like sqlparse, then check for allowed schemas/materializations. This would allow for testing what the prod config is going to try to run without risking anything in prod.

My approach seems harder to implement, so I really like your strategy, but would personally want to change it slightly. Maybe a test interface along the lines of:

- check_that_materialization_db_in_allowed_list
- check_that_materialization_schema_in_allowed_list
- check_that_materialization_is_table