civictechdc / dc-campaign-finance-watch

Displays data about DC Campaign Finance on a site
https://campaign-finance.codefordc.org
29 stars 27 forks source link

Create scaper for pulling new rows out of data.dc.gov #200

Open mkalish opened 7 years ago

mkalish commented 7 years ago

The dataset is too big to be proxied and will not be queryable. Currently, the data has been pushed through 1/1/2017 but a scaper needs to be written to do this regularly.

Tasks:

Upload idead using existing tools from esri to geojson to csv to data portal https://www.npmjs.com/package/esri-dump https://www.npmjs.com/package/json2csv https://www.npmjs.com/package/ckan

Get the last imported data in data portal Check if esri data exist beyond the last imported data If data exist, attempt to get a dump from that start point to the end Since the data in ckan is current csv, convert the data Upload data to data portal

This script could then run to sync data portal info with esri.

romoy commented 7 years ago

This seems to be a pre requisite, so researching this one instead.

romoy commented 7 years ago

http://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Public_Service_WebMercator/MapServer/34 - Campaign Contributions

http://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Public_Service_WebMercator/MapServer/35 - Campaign Expenditures

Use of https://github.com/openaddresses/pyesridump to grab the latest data.

esri2geojson http://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Public_Service_WebMercator/MapServer/34 ocf-contributions.geojson

esri2geojson http://maps2.dcgis.dc.gov/dcgis/rest/services/DCGIS_DATA/Public_Service_WebMercator/MapServer/35 ocf-expenditures.geojson

romoy commented 7 years ago

Added ocf-expenditures.geojson to http://data.codefordc.org/dataset/dc-campaign-expenditures-ocf

romoy commented 7 years ago

Attempt an add of ocf-contributions but failed with 413 response; will try to split the file and upload.

mkalish commented 7 years ago

Looking good. I would take a look at the datastore API that can more gracefully handle pushing a lot of rows

romoy commented 7 years ago

http://maps2.dcgis.dc.gov/dcgis/sdk/rest/index.html#//02ss0000000r000000

romoy commented 7 years ago

Upload idea using existing tools from esri to geojson to csv to data portal https://www.npmjs.com/package/esri-dump https://www.npmjs.com/package/json2csv https://www.npmjs.com/package/ckan

romoy commented 7 years ago

http://www.convertcsv.com/geojson-to-csv.htm

mkalish commented 7 years ago

Does that have an API?

romoy commented 7 years ago

https://github.com/koopjs/koop-opendata

romoy commented 7 years ago

http://opendata.arcgis.com/datasets/DCGIS::campaign-financial-expenditures/geoservice

romoy commented 7 years ago

http://opendata.arcgis.com/datasets/DCGIS::campaign-financial-contributions/geoservice