🕷️ Fix spider: Cuya elections - Githubissues

City-Bureau / city-scrapers-cle

City Scrapers project for Cleveland

https://cityscrapers.org/

MIT License

15 stars 14 forks source link

🕷️ Fix spider: Cuya elections #68

Closed SimmonsRitchie closed 10 months ago

SimmonsRitchie commented 10 months ago

What's this PR do?

Fixes our Cuyahoga County Board of Elections spider (aka. cuya_elections), which broke due to page structure and URL changes.

[Note: This PR builds off #67, which should be reviewed first]

Why are we doing this?

We want working scrapers, of course 🤖 The changes in this PR include changes to URLs and certain parsing methods.

Steps to manually test

After installing the project using pipenv (see Readme):

Activate the virtual environment:
```
pipenv shell
```

Run the spider:

scrapy crawl cuya_elections -O test_output.csv

Monitor the stdout and ensure that the crawl proceeds without raising any errors. Pay attention to the final status report from scrapy.
Inspect test_output.csv to ensure the data looks valid. I suggest opening a few of the URLs under the source column of test_output.csv and comparing the data for that row with what you see.

Are there any smells or added technical debt to note?

This scraper is now using a hardcoded link for the "links" field for each meeting. This agency has a special page where they locate all the attachments for every meeting. If we want to get fancy, we could scrape this page and combine the data with the final data. At present, given the number of broken scrapers that we need to fix, I think it's better to take this approach for now and move on. We can come back to this when time permits.