datahub-project / datahub

The Metadata Platform for your Data and AI Stack
https://datahubproject.io
Apache License 2.0
9.94k stars 2.95k forks source link

Feature: Ingest DBT Contract Information as a DataHub Data Contract #11927

Open matthew-coudert-cko opened 3 days ago

matthew-coudert-cko commented 3 days ago

We (Checkout.com) have started using DataHub's contract feature more intensely over the past few months, and have implemented a custom mapping between DBT contracts and DataHub's data contract feature. We propose implementing this as a part of the native DBT Core ingestion with the following functionality:

  1. DBT Contracts prevent breaking changes (column removals or column type changes), so they are equivalent to a schema contract in DataHub.
  2. DBT Tests assigned with an arbitrary tag (default contract) have their assertion added to the data contract.
  3. Optionally DBT constraints that are enforced in the target data platform (e.g not_null in Snowflake) could be added into the contract as well as always passing.

Example DBT Yaml:

- name: dbt_contract_test_view
  description: This view is used to test the data contract checks for the dbt models.
  config:
    contract:
      enforced: true # this adds a schema contract to the DataHub data contract.
  columns:
      - name: urn
        data_type: text
        description: The urn of the object.
        data_tests:
          - unique
             tags: ['contract'] # this is included in the data contract
          - not_null # this is not

We're happy to contribute this if there's appetite, happy to hide it behind a feature flag in the DBT config as well.

jjoyce0510 commented 3 hours ago

This is really cool.

I am sure others would be quite interested in this! As long as we can place behind reasonably well named feature flags with the appropriate early stage / incubating labeling, I think things should be fine!

To clarify, for dbt contracts you are minting net new schema assertions in DataHub is that right? And for other DBT tests you are simply linking them to the contract for the assets.