cobalt-uoft / uoft-scrapers

Public web scraping scripts for the University of Toronto.
https://pypi.python.org/pypi/uoftscrapers
MIT License
48 stars 14 forks source link

Add textbooks scraper #25

Closed qasim closed 8 years ago

qasim commented 8 years ago

This isn't done, but would like some eyes on it.

kashav commented 8 years ago

This looks really good so far.

It looks like we're gonna end up having very similar entries (just with differing section_id & section_code) if an instructor is teaching the same course multiple times. Is this something we should consider revising?

Also, just to clarify, we aren't planning to keep data for sections that have no book requirements, right? It might be worth it, but the dataset would be filled with a lot of duplicate content from the Course Finder scrapers.

qasim commented 8 years ago

@kshvmdn I'm thinking the data will be book first instead of section. So it'll be a book schema, and the book has ISBN, price, author, etc. and whether it is required by any courses or not. Then people can query for required books by course to have the textbooks returned to them.

In terms of the duplicate entries, I definitely need to add something that will merge all same books and then have the list of sections / courses that it corresponds to inside the schema. I'll post a version 1 of what the schema could look like within the week.

qasim commented 8 years ago

Alright, it now outputs textbooks. Now what's left is to merge unique textbooks and consolidate the courses/meeting sections.

qasim commented 8 years ago

There we go. Should be good for another look over.