codeforsanjose / city-agenda-scraper

9 stars 16 forks source link

Identify novusagenda.com subdomains in AHP Parser #9

Open krammy19 opened 3 years ago

krammy19 commented 3 years ago

This is borrowed from https://github.com/biglocalnews/civic-scraper/issues/55

A number of local governments in the Bay Area and in other parts of the country post their meeting minutes, agendas, etc. on websites on the *novusagenda.com subdomain. These websites typically look something like this or this and follow the web address convention PLACE.novusagenda.com/agendapublic, where PLACE is a custom field.

Your task is to add a novusagenda function to the html-request scraper2 so that it also grabs *novusagenda subdomains as possible. This will allow us to evaluate how many government agencies are using this website format, which, in turn, will help us to decide which scrapers to build next.