cobalt-uoft / uoft-scrapers

Public web scraping scripts for the University of Toronto.
https://pypi.python.org/pypi/uoftscrapers
MIT License
48 stars 14 forks source link

UTM athletics scraper #34

Closed qasim closed 8 years ago

qasim commented 8 years ago

https://m.utm.utoronto.ca/physed.php

Scrape everything that shows up on that page.

In the meanwhile, I will try and get in contact with UTSG athletics to see if they can give their internal drop-in calendar to the public (I know it exists! I've seen it!).

As for UTSC, anyone from there can help out if they know any information about UTSC athletics.

qasim commented 8 years ago

Proposed schema (for each type of athletics):

{
  "id": String,
  "name": String,
  "location": String,
  "building_id": String,
  "date": String,
  "start_time": String,
  "end_time": String
}
kashav commented 8 years ago

Started working on this, scrapes all events for the given month.

Here's the current schema:

{
  "date": String,
  "events": [{
    "title": String,
    "location": String,
    "start_time": String,
    "end_time": String,
  }]
}

This one may work as well (using event titles as IDs):

{
  "id": String,
  "sessions": [{
    "location": String,
    "date": String,
    "start_time": String,
    "end_time": String
  }]
}

As far as I can tell, all athletic events happen in the same building, the Recreation, Athletic & Wellness Centre (building 332). Does it make sense to include a building_id key if it's the same for all of them?

qasim commented 8 years ago

@kshvmdn we can leave building_id in for now in anticipation for using this same schema for UTSG/UTSC data.

We can go with the former schema to match how our data looks for shuttles. LGTM!

qasim commented 8 years ago

@kshvmdn I merged a few refactors / small overhaul of scraper layout, let me know if you're able to merge master properly.

kashav commented 8 years ago

@qasim No conflicts – I think it's good to go!

https://github.com/cobalt-uoft/uoft-scrapers/pull/51

qasim commented 8 years ago

UTSC scraper, awesome!

Let's hope UTSG gives us standardized info on their stuff soon.