Fixes our Cuyahoga County Board of Elections spider (aka. cuya_elections), which broke due to page structure and URL changes.
[Note: This PR builds off #67, which should be reviewed first]
Why are we doing this?
We want working scrapers, of course 🤖 The changes in this PR include changes to URLs and certain parsing methods.
Steps to manually test
After installing the project using pipenv (see Readme):
Activate the virtual environment:
pipenv shell
Run the spider:
scrapy crawl cuya_elections -O test_output.csv
Monitor the stdout and ensure that the crawl proceeds without raising any errors. Pay attention to the final status report from scrapy.
Inspect test_output.csv to ensure the data looks valid. I suggest opening a few of the URLs under the source column of test_output.csv and comparing the data for that row with what you see.
Are there any smells or added technical debt to note?
This scraper is now using a hardcoded link for the "links" field for each meeting. This agency has a special page where they locate all the attachments for every meeting. If we want to get fancy, we could scrape this page and combine the data with the final data. At present, given the number of broken scrapers that we need to fix, I think it's better to take this approach for now and move on. We can come back to this when time permits.
What's this PR do?
Fixes our Cuyahoga County Board of Elections spider (aka.
cuya_elections
), which broke due to page structure and URL changes.[Note: This PR builds off #67, which should be reviewed first]
Why are we doing this?
We want working scrapers, of course 🤖 The changes in this PR include changes to URLs and certain parsing methods.
Steps to manually test
After installing the project using
pipenv
(see Readme):Activate the virtual environment:
Run the spider:
Monitor the stdout and ensure that the crawl proceeds without raising any errors. Pay attention to the final status report from scrapy.
Inspect
test_output.csv
to ensure the data looks valid. I suggest opening a few of the URLs under the source column of test_output.csv and comparing the data for that row with what you see.Are there any smells or added technical debt to note?