biglocalnews / warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant closings and mass layoffs from state government websites
https://warn-scraper.readthedocs.io
Apache License 2.0
28 stars 10 forks source link

Improve RI scraper by removing extraneous slash in URL, sparing redirects #637

Open chriszs opened 3 months ago

chriszs commented 3 months ago

Unlike #636 and #635, the state did not update the WARN page location, rather, it appears we're adding an extraneous slash to the URL, e.g. https://dlt.ri.gov//employers/worker-adjustment-and-retraining-notification-warn, which causes the first page we hit to be a 302 found HTTP redirect to https://dlt.ri.gov/employers/worker-adjustment-and-retraining-notification-warn. We hit a second redirect when we fetch the Excel file, because that shares the same base URL. This is handled transparently by the scraper and therefore works, but it causes twice as many HTTP requests as we really need, and strikes me as just bad hygiene to leave in now that we know about it.