calpoly-csai / cal-poly-knowledge-graph

3 stars 9 forks source link

Course Scraper #2

Open Waidhoferj opened 2 years ago

Waidhoferj commented 2 years ago

Create a new scraper in the scrapers folder that gets course information from the cal poly course catalog. See the Course object in models.py to understand the schema. Review CollegeScraper as an example of a web scraper template.

probably-neb commented 2 years ago

Looking into the sections issue today I found this website which has a table of all of the classes, what they are cross-listed as, their units, the GE area they fulfill, and the terms they are typically offered in. As some of this (most notably the terms they are typically offered in) is useful information and does not seem to be collected in #9 or #10 I felt It might be helpful to share. As far as I can tell the table can't be scraped using Beautiful Soup as it is rendered using JavaScript to pull from this csv file. I would be happy to make a pull request after #9 is merged trying to add the extra information from the csv if that would be helpful.

Waidhoferj commented 2 years ago

Thanks for finding that @probably-neb. If you need to scrape a site rendered with JS, you can use a headless browser like this lib. Another option would be to pull in the CSV and parse that.