codeforsanjose / city-agenda-scraper

9 stars 16 forks source link

Identify PrimeGov subdomains in AHP Parser #8

Open krammy19 opened 3 years ago

krammy19 commented 3 years ago

This is borrowed from https://github.com/biglocalnews/civic-scraper/issues/54

A number of local governments in the Bay Area and in other parts of the country post their meeting minutes, agendas, etc. on websites on the *primegov.com subdomain. These websites typically look something like this and follow the web address convention PLACE.primegov.com/public/portal, where PLACE is a custom field.

Your task is to add a primegov function to the html-request scraper2 so that it also grabs *primegov.com subdomains as possible. This will allow us to evaluate how many government agencies are using this website format, which, in turn, will help us to decide which scrapers to build next.

dineshkumar-23 commented 3 years ago

Hello, Is it fine if we had all the places in a text file and checked the response for each? Like substituting the subdomain in the URL with a place in the text file.

krammy19 commented 3 years ago

It seems like what you're describing is more appropriate for this issue: https://github.com/codeforsanjose/city-agenda-scraper/issues/11