Open simonharrer opened 3 months ago
I already do something like this for import, creating a datacontract.yaml given a dbt project, but was using the "schema"
field instead of the"models"
field, with a custom schema type. (Slightly off-topic, but schema
was much more widely understandable than models
in our workshops. Just some feedback I can provide on its depreciation in the specification).
However our code is/was quite specific to the format of the dbt projects we allowed. To do it properly, one would want to parse & use the manifest.json file from a dbt project. It is the most straightforward way of working with dbt projects generically.
You would go into dbt Nodes in the manifest, and for every resource_type
of model
import the columns, data_types if given, descriptions if given, etc. The only difficulty is mapping the data_types to the supported ones in datacontract spec. Hence why physical model specific schema
might make more sense for the import.. As a first step though, the model
in models
could just not provide the data_type
or provide the dbt one if it matches.
(For parsing the manifest, Dagster-dbt does this as well, and the code is Apache-2 Licensed, if you are looking for inspiration). The import is something I can contribute on, if the implementation sounds ok.
Much easier of course is to be pointed to a dbt schema.yaml file, and using that for importing the models. Anything not defined in that yaml file would be missed. Then again, maybe that's ok.
I think the later is fine. As I presume most people with more than a few dbt models split them into a model per file otherwise it gets quite unwieldily very quickly. Either that or parse them all but allow an input to specify which models you want to include in the data contract as it could be you want to or three for a specific contract?
I think the later is fine. As I presume most people with more than a few dbt models split them into a model per file otherwise it gets quite unwieldily very quickly.
This does not match my experience with larger dbt projects. But one or several models can logically co exist and be part a data contract so it is fine anyway? (It’s reasonable to ask/expect to not mix models from different data products/contracts..)
I'm looking into this right now
Awesome! I assigned you the issue. :-)
@torbenkeller any progress here?
i've been working with dbt, maybe I can help
@jochenchrist Was working on other things the last weeks, sorry. But I will continue on this.
@teoria sounds good, if you want we can pair program to get this ready
@teoria you can contact me on the datacontract slack server
Out of #103 came the idea of having an import of dbt models to a datacontract.yaml