This dbt package helps you to
dbt source freshness
(sources.json and manifest.json)dbt test
(run_results.json and manifest.json)We love contributions! Currently, we don't have a roadmap for this package so feel free to help where you can
Here's some ideas where we would love your contribution:
If you have any questions, you can contact us at info@divergentinsights.com.au
As per the high-level architecture diagram, these are the different functionalities that this package provides:
(Optional) Creation of Snowflake resources to store and make available dbt logging information
Loads dbt logging information on an internal stage
Copies dbt logging information into a Snowflake table
Creating and populating simple dbt models to report on dbt source freshness
and dbt tests
Bonus - it provides a ready-to-go Power BI dashboard built on top the dbt models created by the package to showcase all features
Optionally, set any relevant variables in your dbt_project.yml
vars:
dbt_dataquality:
dbt_dataquality_database: my_database # optional, default is target.database
dbt_dataquality_schema: my_schema # optional, default is target.schema
dbt_dataquality_table: my_table # optional, default is 'stg_dbt_dataquality'
dbt_dataquality_stage: my_internal_stage | my_external_stage, default is 'dbt_dataquality'),
dbt_dataquality_target_path: my_dbt_target_directory # optional, default is 'target'
Important: when using an external stage you need to set the parameter load_from_internal_stage
to False
on the loadlog* macros. See below for more details
Use the macro create_resources
to create the backend resources required by the package
dbt run-operation create_resources
will give you the schema, table and staging tables required by the packageIf you are in a complex environment with stringent permissions, you can run the macro in "dry mode" which will give you the SQL required by the macro. Once you have the SQL you can copy and paste and run manually the parts of the query that make sense
dbt run-operation create_resources --args '{dry_run:True}'
Also, keep in mind that the "create_resources" macro creates an internal stage by default. If you are wanting to load log files via an external stage then you can disable the creation of the internal stage
dbt run-operation create_resources --args '{internal_stage:False}'
Optionally, do a regular run of dbt source freshness or dbt test on your local project to generate some logging files
dbt run
or dbt test
Use the load macros provided by the dbt_quality package to load the dbt logging information that's required
load_log_sources
to load sources.json and manifest.json filesload_log_tests
to load run_results.json and manifest.json filesNote that the load_log_sources
and load_log_tests
macros automatically upload the relevant log and manifest files
For example, the macro load_log_sources
loads sources.json and manifest.json and the macro load_log_tests
loads the files run_results.json and manifest.json
To load data from an external stage, you must:
create_resources
macro set the parameter internal_stage
to False
dbt run-operation create_resources --args '{internal_stage: False}'
dbt_dataquality_stage: my_external_stage
(as described at the beginning of the Usage section)load_log_sources
and load_log_tests
macros set the parameter load_from_internal_stage
to False
dbt run-operation load_log_sources --args '{load_from_internal_stage: False}'
dbt run --select dbt_quality.sources
to load source freshness logsdbt run --select dbt_quality.tests
to load tests logsThis package supports capturing and reporting on Data Quality Attributes. This is a very popular feature!
To use this functionality just follow these simple steps:
Just add tests to your models following the standard dbt testing process Tip: you may want to use some tests from the awesome dbt package dbt-expectations
Tag any tests that you want to report on with your preferred data quality attributes
To keep things simple at Divergent Insights we use the ISO/IEC 25012:2008 standard to report on data quality (refer to the image below)
You can read more about ISO 25012 here; however, here's a summary of the key Data Quality Attributes defined by the standard:
Please note that
dq:accuracy
or dq:timeliness
dq:accuracy
, dq:completeness
, dq:consistency
and dq:timeliness
(we don't use credibility due to obvious reasons)Here's all the steps put together:
dbt run-operation create_resources
dbt source freshness
dbt run-operation load_log_sources
dbt run --select dbt_dataquality.sources
dbt test
dbt run-operation load_log_tests
dbt run --select dbt_dataquality.tests
# Optionally, the dbt_dataquality package uses incremental models so don't forget to use the option `--full-refresh` to rebuild them
# For example
dbt run --full-refresh --select dbt_dataquality.sources
dbt run --full-refresh --select dbt_dataquality.tests
All the content of this repository is licensed under the Apache License 2.0
This is a permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code.