cobalt-uoft / uoft-scrapers

Public web scraping scripts for the University of Toronto.
https://pypi.python.org/pypi/uoftscrapers
MIT License
48 stars 14 forks source link

UofT Drop-in sports schedules #85

Open qasim opened 7 years ago

qasim commented 7 years ago

The drop-in sports schedules at UofT SG seems more structured now:

https://kpe.utoronto.ca/sports-and-rec

There are still some differences between sports, but all seem scrape-able. We should take advantage of this.

kashav commented 7 years ago

Looks like they're loading raw HTML after page load -

jQuery(function($){
  $('#dropinschedule').load('https://class-api.kpe.utoronto.ca:8443/times.php?id_list=6,85,181,182,90,342,675,677&dataonly=true&showcoedcol=true&sport=basketball');
});

I think we can parse the URLs from here and then scrape the HTML from each URL.

The only other approach (as far as I can tell) would be to form a list of all possible id_list values and all possible sport values and then use those (id_list values map to buildings/locations, but not the same ones from the buildings dataset 🙃).

Also, looks like they're only providing data for a week at a time? I think this means that we can't merge this dataset with athletics. Schema can probably remain the same though (minus building_id).

qasim commented 7 years ago

Wow, can't say I'm surprised of the inconsistent building IDs 🙃

We could also limit athletics to just the current week, maybe. Perhaps that's trying too hard to accommodate for this and we should have another endpoint.