NYCPlanning / data-loading-scripts

Scripts for loading data into capital planning databases
3 stars 2 forks source link

Data Loading Scripts

A node.js CLI tool for automated dataset loading into the DCP Capital Planning Database, making the full refresh of a dataset as simple as node loader install {datasetName} Based loosely on docker4data

Structure

datasetName is important, and for now takes the form of {agency}_{dataset}, e.g. dcp_mappluto. Each dataset should have a folder with its datasetName in the datasets directory. When executing the CLI tool, the datasetName is passed in as an argument, and the script searches for {datasetName}/data.json and then runs accordingly.

data.json

data.json includes everything the script needs to download, process, and load the dataset. An example data.json for NYC borough boundaries looks like:

{
  "url": "http://www1.nyc.gov/assets/planning/download/zip/data-maps/open-data/nybb16b.zip",

  "load": "shp2pgsql",
  "loadFiles": [
    {
      "file":"nybb_16b/nybb.shp",
      "table": "dcp_boroughboundaries"
    }
  ],
  "shp2pgsql": [ 
    "-d",
    "-s 2263:4326"
  ] 
}

after.sql

After Push completes, the script will check for the existence of after.sql in the dataset directory. If it exists, it will execute it with psql. This can be used for post-processing in PostgreSQL, such as combining the 5 mappluto tables into one and deleting the source tables.

How to Use

Commands

The workflows are divided into 3 parts for now, get, push, and after.

TODO: