brubsby / SolarPanelDataWrangler

GNU General Public License v3.0
21 stars 3 forks source link

Run_inference script dies every night at 3am due to Connection reset by peer #13

Closed typicalTYLER closed 5 years ago

typicalTYLER commented 5 years ago

Below is the stack trace, always seems to happen right at 3am, maybe when mapbox updates their servers. I've added one automatic retry, but maybe I need to try exponential backoff and more retries. Just tracking the issue in case anybody has any guidance.

I was going to try to do a retry method similar to this, but the mapbox package hides away all the requests details to where it's not possible to specify the session etc.

  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/urllib3/response.py", line 360, in _error_catcher
    yield
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/urllib3/response.py", line 442, in read
    data = self._fp.read(amt)
  File "/usr/lib/python3.5/http/client.py", line 448, in read
    n = self.readinto(b)
  File "/usr/lib/python3.5/http/client.py", line 488, in readinto
    n = self.fp.readinto(b)
  File "/usr/lib/python3.5/socket.py", line 575, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.5/ssl.py", line 929, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.5/ssl.py", line 791, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib/python3.5/ssl.py", line 575, in read
    v = self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/requests/models.py", line 750, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/urllib3/response.py", line 494, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/urllib3/response.py", line 459, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/urllib3/response.py", line 378, in _error_catcher
    raise ProtocolError('Connection broken: %r' % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_inference.py", line 54, in <module>
    image = np.array(imagery.stitch_image_at_coordinate((tile.column, tile.row)))
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/imagery.py", line 190, in stitch_image_at_coordinate
    images.append(get_image_for_coordinate((column, row),))
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/imagery.py", line 178, in get_image_for_coordinate
    image = gather_and_persist_imagery_at_coordinate(slippy_coordinate, final_zoom=FINAL_ZOOM)
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/imagery.py", line 156, in gather_and_persist_imagery_at_coordinate
    retina=(ZOOM_FACTOR > 0))
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/mapbox/services/static.py", line 94, in image
    res = self.session.get(uri)
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/requests/sessions.py", line 546, in get
    return self.request('GET', url, **kwargs)
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/requests/sessions.py", line 686, in send
    r.content
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/requests/models.py", line 828, in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
  File "/home/tyler/PycharmProjects/SolarPanelDataWrangler/venv/lib/python3.5/site-packages/requests/models.py", line 753, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer'))
typicalTYLER commented 5 years ago

Temporary workaround by running the script as python run_inference.py || sleep 30m && python run_inference.py :)

typicalTYLER commented 5 years ago

Added exponential backoff and more retries in this commit: 5fe31a52f4c795fa437a82fe68a37913767d998e

Didn't test as my current inference run was finished, but will re-open this issue if the problem persists