Closed simonharrer closed 6 months ago
What is the status on DBT export? @simonharrer. My team use DBT+Snowflake are looking into a few alternatives for data contract implementations like this cli, paypal etc
@adaminsta I've started with https://github.com/datacontract/cli/pull/38 and want to release a first version of the dbt export on Friday afternoon.
Would love your thoughts and ideas on the dbt export. Any expectations already in your mind?
We want to build a dbt model (in Snowflake) on top of the contract. From the datacontract we are creating a dbt source, dbt schema (with an owner, contract, etc), and the staging model.
Can elaborate more later.
We currently have the following export behaviour implemented in the draft PR:
datacontract.yaml
dataContractSpecification: 0.9.2
id: orders-unit-test
info:
title: Orders Unit Test
version: 1.0.0
models:
orders:
fields:
order_id:
type: varchar
unique: true
required: true
order_total:
type: bigint
required: true
datacontract export --format dbt datacontract.yaml
version: 2
models:
- name: orders
config:
materialized: table
contract:
enforced: true
columns:
- name: order_id
data_type: text
constraints:
- type: not_null
- type: unique
- name: order_total
data_type: integer
constraints:
- type: not_null
Just merged a first implementation of the dbt export. It currently has only a mapping for snowflake but already supports all fields in the model. Does this already help you? What is missing for you @adaminsta ?
Our idea is to use this tool to automatically create the resources and tables/models needed with CI/CD, based on the contract.
Getting the dbt schema is very useful. To make it even more powerful: 1) Generate a dbt source with the model name as table, based on the snowflake database and schema e.g:
models:
- name: orders
config:
materialized: table
contract:
enforced: true
columns:
- name: order_id
data_type: text
constraints:
- type: not_null
- type: unique
[...]
servers:
endpoint:
type: snowflake
account: xx0000
database: MY_DB
schema: MY_SCHEMA
provides
sources:
- name: contract_name
database: my_db
schema: my_schema
tables:
- name: orders
2) Generate a staging model In our automation process we'd also like to push a PR to our DBT repo in order to remove the bottle of the data engineer. Therefore, it would be useful to also generate a staging model that selects all columns from the source.
{{ config(
materialized="table"
) }}
select
order_id
[...]
from {{ source('contract_name', 'orders')}}
Thanks for sharing your ideas. I think your feature ideas should be included in the tool. Your examples help to drive this! Thank you.
Also, this is not related to dbt format but we are consuming this from a pub-sub topic and putting it into snowflake, but there is no pubsub option for servers. Need something like this: source: type: pubsub project: gcp_project_name name: my_topic
Hello everyone, stumbled upon this issue as I have the same need as @adaminsta as far as I understand.
Would it make sense that the generated source includes more info from the contract, such as the column names and metadata ? That would greatly reduce the documentation work and have the information propagated to dbt generated docs as well.
I'd rather have it generate too much (source yaml, staging model yaml and sql selecting all the fields as @adaminsta suggested) and remove what I do not actually need than having to add what I'm missing.
@simonharrer I've got some more issues when trying to test and export to DBT. Maybe we could have a quick call? Sent you an email as well from instabox email
I did not receive an email. Did you send one to simon.harrer@innoq.com ?
Easiest would be if you would join the data contract slack: https://datacontract.com/slack
There, we could easily set up a call.
I've just added the dbt-sources export in a very early implementation: https://github.com/datacontract/cli/commit/ac19d27f2b7f31cb2ab9971338bb25918c2665fb
How to use:
datacontract export --format dbt-sources
and this would result in:
version: 2
sources:
- name: orders-unit-test
description: The orders data contract
database: my-database
schema: my-schema
tables:
- name: orders
description: The orders model
columns:
- name: order_id
tests:
- dbt_expectations.dbt_expectations.expect_column_values_to_be_of_type:
column_type: VARCHAR
- not_null
- unique
- dbt_expectations.expect_column_value_lengths_to_be_between:
min_value: 8
max_value: 10
- dbt_expectations.expect_column_values_to_match_regex:
regex: ^B[0-9]+$
meta:
classification: sensitive
pii: true
tags:
- order_id
- name: order_total
description: The order_total field
tests:
- dbt_expectations.dbt_expectations.expect_column_values_to_be_of_type:
column_type: NUMBER
- not_null
- dbt_expectations.expect_column_values_to_be_between:
min_value: 0
max_value: 1000000
- name: order_status
tests:
- dbt_expectations.dbt_expectations.expect_column_values_to_be_of_type:
column_type: TEXT
- not_null
- accepted_values:
values:
- 'pending'
- 'shipped'
- 'delivered'
Please have a look. Feedback is appreciated! :-)
And we've added support for dbt-staging-sql export: https://github.com/datacontract/cli/commit/7cea40e43b0231acd95a55cbb8269ed2a0d7f4b7
How to use:
datacontract export --format dbt-staging-sql
And this would result in
select
order_id,
order_total,
order_status
from {{ source('orders-unit-test', 'orders') }}
@adaminsta and @pluttgens would love your feedback on the two features that we just implemented.
We now have the three dbt export features: for dbt models, dbt sources, and dbt staging sql. Create a new issue when you want to improve on the export features, or add a new dbt export feature.
This issue is about exporting a data contract to the dbt format.