civictechdc / opendatadc

GNU Affero General Public License v3.0
15 stars 24 forks source link

Data Scraper Poc #45

Open mkalish opened 7 years ago

mkalish commented 7 years ago

As an alternative to directly uploading data to the portal, data can be loaded via the ckan api. Combining that with AWS lambda, we can write scapers to continuously push new data.

Note: AWS Lambda currently supports Python, Javascript and Java

Tasks

mkalish commented 7 years ago

This would be a good choice to get started as a dataset that could be scraped. Its just HTML and I have personal interest in it being available.

http://dccouncil.us/calendar.

I would be interested in grabbing the following fields: