cobalt-uoft / uoft-scrapers

Public web scraping scripts for the University of Toronto.
https://pypi.python.org/pypi/uoftscrapers
MIT License
48 stars 14 forks source link

Look into multi-threaded processes for speeding up CourseFinder scraper #11

Closed qasim closed 8 years ago

qasim commented 9 years ago

I don't know how much this can help, but maybe multi-threading for the CourseFinder scraper can speed things up a little. Current execution time averages 2-4 hours (6000+ requests during Fall/Winter session). Maybe something like pooling processes to cut down on that time. No data needs to be exchanged between the web pages that are to be requested, so they can should run on separate threads no problem.

CourseFinder is unfortunately a time consuming scrape, but provides the most accurate and up-to-date data coming from ROSI itself, updating quite frequently.