cobalt-uoft / uoft-scrapers

Public web scraping scripts for the University of Toronto.
https://pypi.python.org/pypi/uoftscrapers
MIT License
48 stars 14 forks source link

Events scraper #36

Closed qasim closed 8 years ago

qasim commented 8 years ago

https://www.events.utoronto.ca/

qasim commented 8 years ago

Proposed schema:

{
  "id": String,
  "title": String,
  "date": {
    "start": String,
    "end": String,
  },
  "url": String,
  "description": String,
  "campus": String,
  "address": String,
  "audience": [String]
}
g3wanghc commented 8 years ago

@qasim I'm down to work it. :V

qasim commented 8 years ago

@g3wanghc Awesome. Here's a little tweaked schema that I had in mind, keeping with conventions from other scrapers:

{
  "id": String,
  "title": String,
  "date": String,
  "start_time": String,
  "end_time": String
  "url": String,
  "description": String,
  "campus": String,
  "address": String,
  "audience": [String]
}

date would be ISO 8601 format like 2016-04-15. start_time and end_time would also be ISO 8601 like 2016-04-15T12:00:00-04:00 (standardized for eastern timezone). url would just be the URL of the event posting, i.e. https://www.events.utoronto.ca/index.php?action=singleView&eventid=12052.

My thinking is each scraper run should go through all the pages of events (so every event currently listed) and then open each link inside to grab the more detailed information to complete the schema.

Let me know what you think!

g3wanghc commented 8 years ago

Sure, looks like fun. :+1:

g3wanghc commented 8 years ago

@qasim Do we care about Admission Price, Contact Info, Website and Event Sponsor?