biglocalnews / warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant closings and mass layoffs from state government websites
https://warn-scraper.readthedocs.io
Apache License 2.0
29 stars 10 forks source link

MI scraper getting bad data #562

Closed stucka closed 1 year ago

stucka commented 1 year ago

Scraper should process rows where the company name is not "TOTALS:"

5,708 layoffs should have tripped the transformer, though, I think?

https://milmi.org/WARN/

stucka commented 1 year ago

Looks like "TOTALS:" (plural) is the only thing that's missing.

def _parse_html_table(soup):
    black_list = [
        "Number of layoffs indicated Y-T-D",
        "Number of notices received Y-T-D",
        "TOTAL:",
        "Total Layoffs:",
        "Notes:",
        "",
    ]