City-Bureau / city-scrapers-cle

City Scrapers project for Cleveland
https://cityscrapers.org/
MIT License
15 stars 14 forks source link

🕷️ Fix spider: Cuyahoga County Council #108

Closed SimmonsRitchie closed 6 months ago

SimmonsRitchie commented 6 months ago

What's this PR do?

Fixes our Cuyahoga County Council spider (aka. cuya_county_council).

Why are we doing this?

Reported by site partners:

Scraper is missing meetings from the main calendar and is erroneously reporting scraped meetings as cancelled on Documenters.org likely due to an incorrect url connection

The scraper has been rebuilt, which hopefully addresses these issues.

Steps to manually test

After installing the project using pipenv:

  1. Activate the virtual environment:

    pipenv shell
  2. Run the spider:

    scrapy crawl cuya_county_council -O test_output.csv
  3. Monitor the stdout and ensure that the crawl proceeds without raising any errors. Pay attention to the final status report from scrapy.

  4. Inspect test_output.csv to ensure the data looks valid. I suggest opening a few of the URLs under the source column of test_output.csv and comparing the data for the row with what you see on the page.

Are there any smells or added technical debt to note?