A new web scraper we would like to develop would (1) determine whether government websites have a terms of service/privacy policy page, and (2) evaluate how good that page is.
Steps we need to take to actualize these goals (not necessarily in order):
[ ] Verify that the existing software infrastructure for developing/running scrapers is functional. (I've heard @kbalajisrinivas might be useful for this.)
[ ] Get a sense for where government sites tend to keep their terms of service/privacy policy pages
[ ] Define what our metrics and evaluation system are for a "good" terms of service page
[ ] Write a new Python class in scrapers/scrapers/ that builds upon base_scraper.py and contains methods for scraping webpages, finding their terms of service/privace policy page locations (if they exist), and analyzing their contents (as determined by the previous step)
A new web scraper we would like to develop would (1) determine whether government websites have a terms of service/privacy policy page, and (2) evaluate how good that page is.
Steps we need to take to actualize these goals (not necessarily in order):
scrapers/scrapers/
that builds uponbase_scraper.py
and contains methods for scraping webpages, finding their terms of service/privace policy page locations (if they exist), and analyzing their contents (as determined by the previous step)I invite anyone to add/modify this list!