cobalt-uoft / uoft-scrapers

Public web scraping scripts for the University of Toronto.
https://pypi.python.org/pypi/uoftscrapers
MIT License
48 stars 14 forks source link

Scrapers for UofT room availability #13

Closed qasim closed 8 years ago

qasim commented 8 years ago

There are several services at UofT that provide room booking and computer lab availability. We should scrape these.

arkon commented 8 years ago

How're you planning on dealing with this frequently-updating data?

qasim commented 8 years ago

I guess that's a database-generator issue. I think the cleanest method is having a cron job for each type of data (course, buildings, exams, etc.) and then the cron job updates them on different intervals. As a kind of idea, how often are you scraping right now for CDF Labs app? @arkon

arkon commented 8 years ago

@qasim The Android app itself scrapes the lab availability on demand since that page is public. The printer info is done via a cron job every minute on CDF.

qasim commented 8 years ago

@arkon so I guess the other option is: do we want to have Cobalt support APIs that aren't scraped data but rather grabbing data live, or we just scrape both at high frequency intervals.

I'm leaning towards the latter, just since it would be faster for the person requesting the API as they aren't waiting for us to load another page and I'm not sure if there are too many diminishing returns to a 1 minute cron job on those 2 services (other than the fact the user gets data that is at most 1 minute behind).

arkon commented 8 years ago

@qasim I think the delay is fine. I also include a timestamp too, so users know when it was updated.

qasim commented 8 years ago

Gonna shut this one down in terms of implementing into Cobalt. We can still write scrapers for them but they won't be added into Cobalt. Rapidly updating data doesn't fit well in an open data setting, and is taxing to always be scraping as well. I think these types of data should just be scraped by whoever wants to use the data on their own, and let's stick with data that doesn't expire rapidly for Cobalt's APIs.

Will leave this open if anyone wants to add this to cobalt-uoft/uoft-scrapers.

qasim commented 8 years ago

Anyone who is interested in this, please use https://github.com/arkon/cdf-scrapers.