Terms of Service Scraper

A new web scraper we would like to develop would (1) determine whether government websites have a terms of service/privacy policy page, and (2) evaluate how good that page is.

Steps we need to take to actualize these goals (not necessarily in order):

[ ] Verify that the existing software infrastructure for developing/running scrapers is functional. (I've heard @kbalajisrinivas might be useful for this.)
[ ] Get a sense for where government sites tend to keep their terms of service/privacy policy pages
[ ] Define what our metrics and evaluation system are for a "good" terms of service page
[ ] Write a new Python class in scrapers/scrapers/ that builds upon base_scraper.py and contains methods for scraping webpages, finding their terms of service/privace policy page locations (if they exist), and analyzing their contents (as determined by the previous step)
[ ] Write tests for this new class

I invite anyone to add/modify this list!

codeforboston / GovLens

Terms of Service Scraper #108