Open stucka opened 8 months ago
Endpoint shows 49,397 layoffs from 2019.
BLN Missouri file (which may include things not scraped) shows 72,761 total, per Excel.
This is a great opportunity for some extra QA!
QA needed.
BLN version seems to show 364 entries, including combined rows for at least some of the revision entries.
/all endpoint seems to show 327 entries with separate rows for at least some of the revision entries.
Flagging @kirkman instead of the other person I flagged by accident. I need sleep.
Couldn’t a lot of that be amendments?
Sent from my iPhone
On Jan 31, 2024, at 6:47 AM, Mike Stucka @.***> wrote:
Endpoint shows 49,397 layoffs from 2019.
BLN Missouri file (which may include things not scraped) shows 72,761 total, per Excel.
This is a great opportunity for some extra QA!
— Reply to this email directly, view it on GitHubhttps://github.com/biglocalnews/warn-scraper/issues/606#issuecomment-1919253062, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEFU3TVUTJQP4YQQ7UN7BTYRJKOTAVCNFSM6AAAAABCTF3VTOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJZGI2TGMBWGI. You are receiving this because you are subscribed to this thread.Message ID: @.***>
Lotsa duplicates for some reason in the BLN data. If I drop the obvious duplicates I get back to 52,379 layoffs among 256 entries, so it's close to the state's sheet but not quite there.
There may be an undocumented endpoint in Missouri that allows all years to be scraped on a single hit: https://jobs.mo.gov/warn/all
This would need a modicum of testing to ensure we're getting identical output to the per-year scrapes. Hitting this endpoint might reduce the chance we get snared by anti-abuse systems flagged in #597 by @kirkman because we're not hitting all the pages all the time.