UTDNebula / nebula-api

The central API for Nebula Labs. Makes UTD data easily available through endpoints for both internal and public usage.
MIT License
26 stars 33 forks source link

Build a Coursebook Data Scraper #18

Closed CharlieMahana closed 3 years ago

CharlieMahana commented 3 years ago

Design and implement a web scraper that can extract course and section data from coursebook for uploading to our database.

There are no restrictions or requirements on how this is to be completed so long as it accomplishes the task of extracting the requisite data form Coursebook. Python 3 has many libraries that may be of use such as urllib, requests, and beautiful soup so this is probably a good place to start.

CharlieMahana commented 3 years ago

I created a very primitive web scraper that technically works, although inconsistently. This could be my fault, coursebook's fault, or a combination of the two. Regardless, the scraper needs to be reformatted to be both more efficient and more maintainable. Please improve and complete the web scraper while I focus on other sections of the project.

It can be found here c4c78f0043c27ed4eb1f2148a26df0947db5ebb9

CharlieMahana commented 3 years ago

Some details on how the scraper works:

Finally, the PTGSESSID is a cookie that is required to make requests. The scraper only works if you sign in with your UTD NetID and then copy the PTGSESSID

WillieCubed commented 3 years ago

@CharlieMahana can you go ahead and draft (tracking) PRs for the branches that mention this and the other issues?