A node.js CLI tool for automated dataset loading into the DCP Capital Planning Database, making the full refresh of a dataset as simple as node loader install {datasetName}
Based loosely on docker4data
datasetName
is important, and for now takes the form of {agency}_{dataset}, e.g. dcp_mappluto
. Each dataset should have a folder with its datasetName in the datasets
directory. When executing the CLI tool, the datasetName is passed in as an argument, and the script searches for {datasetName}/data.json
and then runs accordingly.
data.json
includes everything the script needs to download, process, and load the dataset. An example data.json
for NYC borough boundaries looks like:
{
"url": "http://www1.nyc.gov/assets/planning/download/zip/data-maps/open-data/nybb16b.zip",
"load": "shp2pgsql",
"loadFiles": [
{
"file":"nybb_16b/nybb.shp",
"table": "dcp_boroughboundaries"
}
],
"shp2pgsql": [
"-d",
"-s 2263:4326"
]
}
url
- the download link, grabbed when using the Get
method.load
- which loading type to use. shp2pgsql
is the only one working right now and easily imports shapefiles. Future types will include 'psql' with specified SQL file stored in the dataset's directory, and ogr2ogr
, etc. We will add new types as needed.loadFiles
- an array of files to be loaded when using the Push
method. Most datasets will just have one, but some, like pluto, have a single file to download but many files to upload. shp2pgsql
- an array of arguments for the shp2pgsql command. Only used if load
is set to 'shp2pgsql'After Push
completes, the script will check for the existence of after.sql
in the dataset directory. If it exists, it will execute it with psql
. This can be used for post-processing in PostgreSQL, such as combining the 5 mappluto tables into one and deleting the source tables.
npm install
dbconfig.sample.js
and saving it as dbconfig.js
ftpauth.sample.js
and saving it as ftpauth.js
node loader {command} {datasetName}
in the terminal, like node loader install dcp_boroboundaries
The workflows are divided into 3 parts for now, get
, push
, and after
.
get
downloads the file to the temp directory and unzips if necessarypush
grabs the file(s) from the temp directory and loads it(them) into the databaseafter
always runs after push
and checks for after.sql
in the dataset directory. If it exists, it will execute the sqlinstall
does both get
and push
TODO: