biglocalnews / civic-scraper

Tools for downloading agendas, minutes and other documents produced by local government
https://civic-scraper.readthedocs.io
Other
42 stars 14 forks source link

Scrape Primegov sites #150

Closed antidipyramid closed 2 years ago

antidipyramid commented 2 years ago

Overview

This PR adds the PrimeGovSite class to scrape PrimeGov sites. For each city, I've found multiple PrimeGov API endpoints that can be queried:

  1. (GET) https://[city].primegov.com/api/meeting/search?from=[m/d/y]&to=[m/d/y]
  2. (GET) https://[city].primegov.com/v2/PublicPortal/ListUpcomingMeetings
  3. (GET) https://[city].primegov.com/v2/PublicPortal/ListArchivedMeetings?year=[year]
  4. (POST) https://[city].primegov.com/api/search?

For now, I've hardcoded PrimeGovSite.scrape() to scrape only the first endpoint with a default search period of the past 30 days. In the future, if necessary, we could add a way to choose which endpoint to query.

Contrary to my initial impression, each endpoint seems to return identically structured JSON data across cities. Each city can choose whether or not to use a particular endpoint in the implementation of their agenda search front-end.

Currently, tests/primegov_test.py scrapes meetings for the following cities:

Testing Instructions

Run docker-compose run --rm scraper python tests/primegov_test.py