flexanalytics / dbt_observability

Apache License 2.0
1 stars 0 forks source link

dbt Observability Package

This dbt package builds a data mart of tables that track dbt projects, run/build history, and a history of table and columns.

Based on dbt_artifacts, this project differs in a few key ways:

Also similar to dbt-data-reliability, dbt-observability differs in a few key ways:

The package currently supports BigQuery, Databricks, Spark, Snowflake, Redshift, Postgres, Oracle, MySQL, MariaDB and MS SQL Server adapters.

Quickstart

  1. Add this package to your packages.yml:

    packages:
    - package: flexanalytics/dbt_observability
    version: 2.5.0
  2. Run dbt deps to install the package

  3. Add an on-run-end hook to your dbt_project.yml: on-run-end: "{{ dbt_observability.upload_results(results) }}" (We recommend using the "dbt_observability:environments": ["prod"] variable in your project to control which environments this package runs in [currently defaults to ["prod"]])

  4. If you are using selectors, be sure to include the dbt_observability models in your dbt invocation step, for example: dbt build --select some_model dbt_observability

  5. Run your project!

:construction_worker: Always run the dbt_observability models in every dbt invocation which uses the upload_results macro. This ensures that the source models always have the correct fields in case of an update.

Configuration

The following configuration can be used to specify where the raw (sources) data is uploaded, and where the dbt models are created:


vars:
...
  "dbt_observability:tracking_enabled": true # optional, create observability base tables - default is true
  "dbt_observability:environments": ["prod"] # optional, default is ["prod"]
  "dbt_observability:path": "models/marts/" # optional, which paths should observability monitor. must be in the form of "dbt_observability:path": "path/subpath/" - default is `None`, will run on all paths in the project
  "dbt_observability:materialization": ["table","incremental"] # optional, which model materialization should observability run on. must be array of "table", "view", "incremental", "ephemeral" - default is ["table","incremental"]
  "dbt_observability:track_source_rowcounts": false # optional, track source rowcounts - default is false [depending on your dbms, this can be slow and resource intensive as it may require a full table scan if the dbms does not store rowcounts in information_schema.tables]
  "dbt_observability:rowcount_diff_threshold_pct": .05 # optional, set threshold for no_anomaly test (see below)
...

models:
  ...
  dbt_observability:
    +database: your_destination_database # optional, default is your target database
    +schema: your_destination_schema # optional, default is `observability`
    staging:
      +database: your_destination_database # optional, default is your target database
      +schema: your_destination_schema # optional, default is `observability`
    sources:
      +database: your_sources_database # optional, default is your target database
      +schema: your sources_database # optional, default is `observability`

Note that model materializations and on_schema_change configs are defined in this package's dbt_project.yml, so do not set them globally in your dbt_project.yml (see docs on configuring packages):

Configurations made in your dbt_project.yml file will override any configurations in a package (either in the dbt_project.yml file of the package, or in config blocks).

Environment Variables

If the project is running in dbt Cloud, the following five columns (https://docs.getdbt.com/docs/dbt-cloud/using-dbt-cloud/cloud-environment-variables#special-environment-variables) will be automatically populated in the invocations model:

To capture other environment variables in the invocations model in the env_vars column, add them to the env_vars variable in your dbt_project.yml. Note that environment variables with secrets (DBT_ENV_SECRET_) can't be logged.

vars:
  env_vars: [
    'ENV_VAR_1',
    'ENV_VAR_2',
    '...'
  ]

dbt Variables

To capture dbt variables in the invocations model in the dbt_vars column, add them to the dbt_vars variable in your dbt_project.yml.

vars:
  dbt_vars: [
    'var_1',
    'var_2',
    '...'
  ]

dbt Tests

Projects can use the no_anomaly test on source tables in their sources.yml. The test will check if the difference between the current source table rowcount and the average of all previous observed rowcounts is significant enough to raise an error/warning. Set the dbt_observability:rowcount_diff_threshold_pct variable (see above) to override the default threshold of .05 (i.e. a 5% difference in rowcount).

sources:
  - name: sourcename
    tables:
      - name: some_source_table
        tests:
          - dbt_observability.no_anomaly:
              config:
                severity: error

Acknowledgements

This package is based on dbt_artifacts. Thanks to Brooklyn Data Co. for all the hard work put into the initial versions of this project. Thank you to Tails.com for initial development and maintenance of this package. On 2021/12/20, the repository was transferred from the Tails.com GitHub organization to Brooklyn Data Co. The macros in the early versions package were adapted from code shared by Kevin Chan and Jonathan Talmi of Snaptravel.

Thank you for sharing your work with the community!