NYCPlanning / db-zap-opendata

Workflow for creating subset of ZAP data that's on open data
0 stars 0 forks source link

db-zap-opendata

Workflows for:

ZAP Open Data

This repo archives and processes the data in DCP's customer relationship management (CRM) system used for ZAP projects.

Two versions of the CRM data are archived to Digital Ocean S3 file storage, Google Cloud Storage and BigQuery:

Run a ZAP Open Data export script

  1. Open repo in the defined dev container

  2. Run a ZAP Pull

    python -m src.runner <name of the entity>

    e.g.

    python -m src.runner dcp_projects

MapZAP

This repo is used to build MapZAP, a dataset of ZAP project records with spatial data.

A ZAP project has a geometry assigned to it from either of two sources:

Data sources

Note: All source data is in BigQuery as a result of the ZAP Open Data export workflows in this repo

Build process (locally only)

  1. Clone the repo and create .devcontainer/.env

  2. Open the repo in the defined dev container in VS code

  3. Run the following dbt commands to build BigQuery dataset named dbt_mapzap_dev_YOUR_DBT_USER

    dbt debug
    dbt deps
    dbt seed --full-refresh
    dbt run
    dbt test

    Note: Use of dbt requires a personal keyfile at .dbt/bigquery-dbt-dev-keyfile.json. This can be generated and shared by a Google Cloud Platform admin.

  4. Review outputs in BigQuery and export as CSV to Google Cloud Storage

Dev

Note: set the environmental variables in .env according to example.env.

Using dbt

Setup

dbt deps
dbt debug

Building tables

dbt seed --full-refresh
dbt run
dbt test

Building docs

dbt docs generate
dbt docs serve

Develop dbt

Run pre commit checks for all model and config files:

pre-commit run --all-files

Note: This is configured by .pre-commit-config.yaml and will run dbt compile and dbt docs generate

Run a single model:

dbt run --select int_zap_project_bbls

Run a single model and it's parent models:

dbt run --select +int_zap_project_bbls

Run a single model, its children, and the parents of those children:

dbt run --select int_zap_project_bbls@

Notes for in-progress MapZAP work