biglocalnews / civic-scraper

Tools for downloading agendas, minutes and other documents produced by local government
https://civic-scraper.readthedocs.io
Other
42 stars 14 forks source link

Legistar zero events bug #158

Closed zstumgoren closed 1 year ago

zstumgoren commented 2 years ago

As part of the Legistar refactor, noticed that python-legistar-scraper is returning zero "events" for the following sites in tests/legistar_site.py. Haven't pinpointed the precise nature of the issue, but it appears to be something with the underlying framework (as opposed to the civic-scraper legistar.site.Site implementation).

Below is a list of the sites where events are not being captured as expected:

antidipyramid commented 2 years ago

@zstumgoren

The sites you mentioned were returning zero events for one of two reasons:

  1. All but two have two tables on their calendar pages: one for upcoming meetings and a second, searchable table. The Legistar scraper assumed the page only had the second. We merged in a fix here.
  2. The remaining sites (Goodyear and Petersburg) seem to have stopped updating their sites at all. For both, the last event in their calendar is from 2021. The scraper (starting with the current year) stops when it hits a year with no events. Since these municipalities haven't inputted any events for 2022, the scraper returns an empty list.

Do we want to handle sites that haven't been updated?

zstumgoren commented 2 years ago

@antidipyramid Thanks for pinning down the details! Sounds good on the fix for (1). For (2), it would be great to handle the non-updating sites for the purposes of backfilling, although we'll need to flag and investigate those agencies to determine if they're using some new site to post agendas, minutes, etc. On (2), would the fix need to happen on our end in the civic-scraper framework?

antidipyramid commented 2 years ago

@zstumgoren No, if we want to account for non-updated sites like Goodyear, the fix needs to happen in the Legistar scraper.

fgregg commented 1 year ago

This has been fixed.