This is a tool to get Virginia restaurant health inspection data from the HealthSpace website into a database. This is a complete rebuild of v1.0 of the scraper to account for changes in the HealthSpace website and to take advantage of new libraries.
The scraper is built for Python 3.4. It makes use of the Scrapy library. Addresses will be geocoded using the SmartyStreets API. To use SmartyStreets you will need to obtain a key.
To run:
Run pip install -r requirements.txt
to install the necessary dependencies.
Set the following environment variables or use the defaults in scraper/settings.py
:
MONGODB_SERVER
MONGODB_PORT
MONGODB_DB
MONGODB_COLLECTION
If you need MongoDB authentication, also set
MONGODB_USER
MONGODB_PWD
If you want to use the SmartyStreets geocoding integration, also set the following environment variables:
SS_ID
SS_TOKEN
Run the python 3.x script. The scraper can be stopped using Ctrl/Cmd + C
(only once) and can then be restarted at the point where it stopped. It will save it's progress in the folder specified by the JOBDIR
setting in scraper/settings.py
scrapeHealthData.py
[Apache 2.0] (https://www.apache.org/licenses/LICENSE-2.0.html)