biglocalnews / warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant closings and mass layoffs from state government websites
https://warn-scraper.readthedocs.io
Apache License 2.0
29 stars 10 forks source link

Discontinue old CA scraping #537

Open stucka opened 1 year ago

stucka commented 1 year ago

CA scraper is parsing PDFs from 2015, and not surprisingly is the slowest-running scraper of the bunch.

chriszs commented 12 months ago

I wonder if:

  1. The scraper could be sped up/cleaned up.
  2. Whether there could be a way to archive the data from older years so it is retained, but we don't have to continually re-scrape it. There's precedent for hosting a spreadsheet file somewhere static, perhaps on BigLocalNews somewhere, which the scraper just pulls and integrates. That stuff doesn't change much, but archival data is still good to have. The oldest states go back to when the WARN Act first took effect in 1989 and I started thinking of completeness in terms of not just states but also years (I was shooting for seven years of coverage based on what seemed achievable to get for historical comparison) and people (because 50 states isn't possible, so you'll want to be able to say it covers 9X% of the U.S. population), as well as percentage of job loss overall.