biglocalnews / warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant closings and mass layoffs from state government websites
https://warn-scraper.readthedocs.io
Apache License 2.0
28 stars 10 forks source link

Wisconsin broke its data #594

Closed stucka closed 6 months ago

stucka commented 6 months ago

Wisconsin's system has made the "current year" file for 2024, but disappeared the 2023 data set. We could try to work the scraper around the problem but really Wisconsin needs to fix this.

https://dwd.wisconsin.gov/dislocatedworker/warn/

cephillips commented 6 months ago

Do we need to do a workaround in the meantime?

From: Mike Stucka @.> Date: Tuesday, January 2, 2024 at 8:47 AM To: biglocalnews/warn-scraper @.> Cc: Subscribed @.***> Subject: [biglocalnews/warn-scraper] Wisconsin broke its data (Issue #594)

Wisconsin's system has made the "current year" file for 2024, but disappeared the 2023 data set. We could try to work the scraper around the problem but really Wisconsin needs to fix this.

https://dwd.wisconsin.gov/dislocatedworker/warn/

— Reply to this email directly, view it on GitHubhttps://github.com/biglocalnews/warn-scraper/issues/594, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEFU3XQURXS2FZRXFSQSRLYMQT2JAVCNFSM6AAAAABBKGS4DKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA3DENRQGI2TCOA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

stucka commented 6 months ago

@cephillips I think in this case the right thing to do is to let Wisconsin know they've actually deleted some data, and hopefully they can get it restored. They're all set up for 2024 so when they get a 2024 entry the scraper should pick right up and get back to work. Big Local News is not missing any data so the scraper error isn't a problem for us, but the missing data is a problem for the public. I'll message the state momentarily.

CT and DC have a problem where the scrapers are set up to look for filenames with the year in them (e.g., "warn2024.html") and files for the new year have not been created. We're not missing any data as 2023 files are complete there. I can build workarounds for those but it'd likely involve scraping and parsing the web site, which may be more prone to break than the current systems ("get files from the current year back to 2015"). I'll message you separately on Slack ...

stucka commented 6 months ago

I stand corrected; the Wisconsin data's there. I'll work up a fix probably this afternoon.