This repository includes ETL pipelines for all the datasets fed into CEQR (City Environmental Quality Review) app. It is managed by NYC Planning's data engineering team.
RECIPE_ENGINE
, CEQR_DATA
, BUILD_ENGINE
,and EDM_DATA
under the /ceqr
directory. See .env.example
.python3 -m venv base
to set up the virtual environment.source base/bin/activate
to activate the virtual environment. To deactivate once finished, type deactivate
.pip3 install -e .
to install packages required accross multiple data schema.ceqr run <schema_name>
at root directory. For example, ceqr run ceqr_school_buildings
, which allows you to build ceqr_school_buildings
from scratch/ceqr/recipes
directory as individual folders named by the datasets.build.py
A python script that will transform and integrate source datas into a target tableconfig.json
A configuration file specifying the input table names, output table name and DDL (output table schemas).
geo_rejects
tablesREADME.md
The metadata about the ETL pipelinerequirements
The required dependencies need to install to run the python scriptrunner.sh
A shell script, by executing which, you can build a dataset from scratch. or you can execute ceqr run <schema_name>
at root directory
├── ceqr
│ ├── recipes
│ │ ├── <schema_name_1>
│ │ │ ├── build.py
│ │ │ ├── config.json
│ │ │ ├── README.md
│ │ │ ├── requirements.txt
│ │ │ └── runner.sh
│ │ ├── <schema_name_2>
│ │ │ ├── build.py
│ │ │ ├── config.json
│ │ │ ├── README.md
│ │ │ ├── requirements.txt
│ │ │ └── runner.sh
...
/ceqr/recipes
directoryconfig.json
, README.md
, build.py
, requirements.txt
and runner.sh
as described in the Repo directory structure within this new folder
output table schema
, besides the requirements specified by the data users, it also need to follow the CEQR data schema standards.