biglocalnews / warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant closings and mass layoffs from state government websites
https://warn-scraper.readthedocs.io
Apache License 2.0
29 stars 10 forks source link

Incomplete data coming through #543

Open stucka opened 1 year ago

stucka commented 1 year ago

Bot sent through something with no state postal code, even CHARTER COMMUNICATIONS LLC - CENTRAL REGION Notice date: 2023-07-26 FALLON HEALTH WEINBERG, INC. - WESTERN REGION Notice date: 2023-07-20

stucka commented 1 year ago

The initial problem is likely one with warn-bot, but it highlighted another.

New York's scraper is combining fields, such as Charter Communications LLC - Central Region

Company name can be separated as " - ".join(companyname.split(" - ")[:-1]

Region name can be separated as companyname.split(" - ")[-1] or companyname.split(" - ")[-1].replace(" Region", "")

Not sure if that would duplicate data n the database as the company name would shift considerably.