biglocalnews / warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant closings and mass layoffs from state government websites
https://warn-scraper.readthedocs.io
Apache License 2.0
28 stars 10 forks source link

OH missing earliest years from PDF #561

Open stucka opened 10 months ago

stucka commented 10 months ago

The Ohio scraper has been rebuilt and most of the archives were consolidated into a single CSV for download.

However, the CSV that Big Local News had been hosting contained badly parsed data from the PDFs of 2015 and 2016, containing a bunch of junk characters. We could use someone to parse out the two PDFs into a CSV format so we can get them added to our archival data.

The original PDFs are included in the ZIP, as is the then-consolidated snapshot of the CSV:

https://storage.googleapis.com/bln-data-public/warn-layoffs/oh_2015-2022.zip

The current scraper is grabbing 2017-2022 from a CSV similar to the one that's in the ZIP file here, other than the 2015, 2016, and 2023 data have been purged from it.