Queens-Hacks / qcumber-scraper

Scrapes SOLUS and generates structured data
3 stars 6 forks source link

qcumber-scraper

This is the component of Qcumber that scrapes the data off SOLUS, parses it, and generates structured data that the site can then display.

Setup Guide

  1. Installing the Prerequisites

Python and Libraries

This project has been designed to work with Python versions 2.7.x and 3.3.x You can try other versions, but no promises.

Python 3.3.x is recommended.

Git and a Github account

Pip and a Virtual Environment

Pip is used to install extra Python modules that aren't included by default. A virtual environment is an isolated Python environment. It allows for per-program environment configuration.

  1. Fork the Repository

  1. Clone it to your computer

  1. Create and Activate a Virtual Environment

  1. Install Required Packages

Make sure you have activated your virtual environment (see above) before running this command!

Runnning a scrape

The standard maintenance periods are Tuesdays and Thursdays from 5 am to 7:30 am and Sundays from 5 am to 10 am. There doesn't seem to be any place this is documented, but if you access the site during maintenance times it will tell you. You will need to run scrapes around these maintenance times.

Better Logging

For better logging and debugging later it is recommended to redirect the output to log files. Something like: python main.py >logs/debug.log 2>logs/error.log

To watch the logs as they happen, first open 2 other terminals, and run tailf logs/debug.log in one, and tailf logs/error.log in the other. Then start the main scrape command like above.