biglocalnews / warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant closings and mass layoffs from state government websites
https://warn-scraper.readthedocs.io
Apache License 2.0
29 stars 10 forks source link

Fix GA: Scraper crashing #501

Closed palewire closed 1 year ago

palewire commented 1 year ago
pipenv run python -m warn.cli ga -l DEBUG
2022-11-28 12:07:32,064 - warn.runner - Scraping ga
2022-11-28 12:07:32,065 - warn.utils - Requesting https://www.dol.state.ga.us/public/es/warn/searchwarns/list?geoArea=9&year=2022&step=search
Traceback (most recent call last):
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._validate_conn(conn)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
    conn.connect()
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/connection.py", line 414, in connect
    self.sock = ssl_wrap_socket(
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/opt/hostedtoolcache/Python/3.9.15/x64/lib/python3.9/ssl.py", line [50](https://github.com/biglocalnews/warn-github-flow/actions/runs/3564571489/jobs/5988692654#step:4:53)1, in wrap_socket
    return self.sslsocket_class._create(
  File "/opt/hostedtoolcache/Python/3.9.15/x64/lib/python3.9/ssl.py", line 1041, in _create
    self.do_handshake()
  File "/opt/hostedtoolcache/Python/3.9.15/x64/lib/python3.9/ssl.py", line 1310, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1129)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.dol.state.ga.us', port=443): Max retries exceeded with url: /public/es/warn/searchwarns/list?geoArea=9&year=2022&step=search (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1129)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.9.15/x64/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/hostedtoolcache/Python/3.9.15/x64/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/cli.py", line 79, in <module>
    main()
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/cli.py", line 75, in main
    runner.scrape(scrape)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/runner.py", line [52](https://github.com/biglocalnews/warn-github-flow/actions/runs/3564571489/jobs/5988692654#step:4:55), in scrape
    data_path = state_mod.scrape(self.data_dir, self.cache_dir)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/scrapers/ga.py", line 58, in scrape
    page = utils.get_url(url)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/utils.py", line 121, in get_url
    response = requests.get(url, **kwargs)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/requests/adapters.py", line [56](https://github.com/biglocalnews/warn-github-flow/actions/runs/3564571489/jobs/5988692654#step:4:59)3, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.dol.state.ga.us', port=443): Max retries exceeded with url: /public/es/warn/searchwarns/list?geoArea=9&year=2022&step=search (Caused by SSLError(SSLError(1, '[SSL: UNSAFE_LEGACY_RENEGOTIATION_DISABLED] unsafe legacy renegotiation disabled (_ssl.c:1129)')))
make: *** [Makefile:[71](https://github.com/biglocalnews/warn-github-flow/actions/runs/3564571489/jobs/5988692654#step:4:74): scrape] Error 1
palewire commented 1 year ago

This might be our huckleberry.

Ash1R commented 1 year ago

Changing the url to http seems to allow us to get data again, although that may not be the best solution? I'm not sure how we'd implement the popular solution on the stack overflow...

palewire commented 1 year ago

Hey, if it works, lets do it!

palewire commented 1 year ago

You called it. That worked. Good thinking.