NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
20 stars 0 forks source link

enforce a contract on `green_fast_track_bbls` #738

Closed damonmcc closed 5 months ago

damonmcc commented 5 months ago

yay! We paused on using dbt to enforce a contract on important tables because it failed to handle data_type: geometry.

Just like when we create a seed file with a geometry column here, we have to use the PostGIS column types like geometry(Geometry, 2263) to declare the type of a geometry column in dbt yaml files.

from dbt docs here:

When building a model with a defined contract, dbt will do two things differently:

  • dbt will run a "preflight" check to ensure that the model's query will return a set of columns with names and data types matching the ones you have defined. This check is agnostic to the order of columns specified in your model (SQL) or YAML spec.
  • dbt will include the column names, data types, and constraints in the DDL statements it submits to the data platform, which will be enforced while building or updating the model's table.

screenshots of data contract results

Logs from local runs against db-green-fast-track.dm_gft_shadows which has all child models of green_fast_track_bbls already built.

when columns are missing

Screenshot 2024-04-04 at 12 17 28 PM

when passing

Screenshot 2024-04-04 at 12 19 38 PM
damonmcc commented 5 months ago

oops I shouldn't have added the Shadow's columns here. gonna drop that change so that builds on main don't fail before https://github.com/NYCPlanning/data-engineering/pull/727 is done