biglocalnews / warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant closings and mass layoffs from state government websites
https://warn-scraper.readthedocs.io
Apache License 2.0
28 stars 10 forks source link

Consider adding retry to some core library functions #648

Open stucka opened 1 month ago

stucka commented 1 month ago

I hadn't seen this kind of failure before, but it might be easy enough to add another retry decorator:

2024-05-20 19:26:01,222 - warn.utils - Requesting https://dlt.ri.gov//media/15796/download?language=en 2024-05-20 19:26:05,656 - warn.utils - Response code: 200 2024-05-20 19:26:05,657 - warn.cache - Writing to data/warn-scraper/cache/ri/WARN Report.xlsx Traceback (most recent call last): File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/response.py", line 737, in _error_catcher yield File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/response.py", line 883, in _raw_read raise IncompleteRead(self._fp_bytes_read, self.length_remaining) urllib3.exceptions.IncompleteRead: IncompleteRead(29126 bytes read, 136 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/requests/models.py", line 816, in generate yield from self.raw.stream(chunk_size, decode_content=True) File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/response.py", line 1043, in stream data = self.read(amt=amt, decode_content=decode_content) File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/response.py", line 963, in read data = self._raw_read(amt) File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/response.py", line 891, in _raw_read self._fp.close() File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/contextlib.py", line 137, in exit self.gen.throw(typ, value, traceback) File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/urllib3/response.py", line 761, in _error_catcher raise ProtocolError(arg, e) from e urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(29126 bytes read, 136 more expected)', IncompleteRead(29126 bytes read, 136 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/cli.py", line 79, in main() File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/click/core.py", line 783, in invoke return __callback(args, **kwargs) File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/cli.py", line 75, in main runner.scrape(scrape) File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/runner.py", line 52, in scrape data_path = state_mod.scrape(self.data_dir, self.cache_dir) File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/scrapers/ri.py", line 53, in scrape excel_path = cache.download(f"{state_code}/WARN Report.xlsx", excel_url) File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/warn/cache.py", line 105, in download for chunk in r.iter_content(chunk_size=8192): File "/home/runner/.local/share/virtualenvs/warn-github-flow-R1xICqqL/lib/python3.9/site-packages/requests/models.py", line 818, in generate raise ChunkedEncodingError(e) requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(29126 bytes read, 136 more expected)', IncompleteRead(29126 bytes read, 136 more expected)) make: *** [Makefile:71: scrape] Error 1 Error: Process completed with exit code 2.