biglocalnews / warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant closings and mass layoffs from state government websites
https://warn-scraper.readthedocs.io
Apache License 2.0
29 stars 10 forks source link

Fix SC: href missing #504

Closed palewire closed 1 year ago

palewire commented 1 year ago

As seen here

pipenv run python -m warn.cli sc -l DEBUG
2022-12-01 12:07:35,306 - warn.runner - Scraping sc
2022-12-01 12:07:35,306 - warn.utils - Requesting https://scworks.org/employer/employer-programs/at-risk-of-closing/layoff-notification-reports
/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/connectionpool.py:1045: InsecureRequestWarning: Unverified HTTPS request is being made to host 'scworks.org'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  warnings.warn(
2022-12-01 12:07:35,451 - warn.utils - Response code: 404
2022-12-01 12:07:35,4[51](https://github.com/biglocalnews/warn-github-flow/actions/runs/3592410692/jobs/6048090125#step:4:54) - warn.cache - Writing to cache data/warn-scraper/cache/sc/source.html
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.9.15/x64/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/hostedtoolcache/Python/3.9.15/x64/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/cli.py", line 79, in <module>
    main()
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/cli.py", line 75, in main
    runner.scrape(scrape)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/runner.py", line [52](https://github.com/biglocalnews/warn-github-flow/actions/runs/3592410692/jobs/6048090125#step:4:55), in scrape
    data_path = state_mod.scrape(self.data_dir, self.cache_dir)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/scrapers/sc.py", line [56](https://github.com/biglocalnews/warn-github-flow/actions/runs/3592410692/jobs/6048090125#step:4:59), in scrape
    a_href = a["href"]
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/bs4/element.py", line 1519, in __getitem__
    return self.attrs[key]
KeyError: 'href'
make: *** [Makefile:[71](https://github.com/biglocalnews/warn-github-flow/actions/runs/3592410692/jobs/6048090125#step:4:74): scrape] Error 1
palewire commented 1 year ago

Looks the URL we are pulling is now a 404

https://scworks.org/employer/employer-programs/at-risk-of-closing/layoff-notification-reports