NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
14 stars 0 forks source link

DBT ZTL: set up dbt project with sources #910

Closed fvankrieken closed 6 days ago

fvankrieken commented 1 week ago

todos

background/description

As a primary step of getting ztl ready for dbt, we'll need to get the project set up and configured in products/zoningtaxlots. Luckily, products/green_fast_track offers a great template for what's needed.

Project, profiles, and packages yml files should be created. For now this is a bit boilerplate - ideally we want to share some of these things across products, but for now this keeps things simple for getting ztl running with dbt

Additionally, a models folder should be added, for now just with _sources.yml. This file is meant to tell dbt what resources/tables/etc will be available at the time of running dbt - the environment before dbt is invoked. For us, this means whatever tables we've loaded using recipe.yml and dcpy.lifecycle.builds.plan/load. This will take a little more work. To start, it can just be set up with names of sources. If load command is run, you should then be able to run dbt build and have it at least verify that sources exist. At that point, if all else is working, expectations of source data can be added. The easiest way to do this will be, after running load command on a clean database, inspecting the tables in DBeaver. Note that we don't necessarily need to specify all columns in each source dataset as required/expected, just the ones used in the build. This is a little harder to find, and will take some digging through sql files.