eyeseast / geocode-sqlite

Geocode rows in a SQLite database table
Apache License 2.0
231 stars 6 forks source link

Handle timeouts and rate limiting errors more gracefully #13

Open eyeseast opened 3 years ago

eyeseast commented 3 years ago

Right now, the whole stack trace is being printed, which breaks the progress bar, and it doesn't actually stop geocoding.

eyeseast commented 1 year ago

Here's an example:

347 rows  [##########--------------------------]   28%  00:07:26RateLimiter caught an error, retrying (0/2 tries). Called with (*('72265 Varner Rd., Thousand Palms, CA 92276',), **{'bounds': ((33.030551, -119.787326), (34.695341, -115.832248))}).
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 1346, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1285, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1331, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1280, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1040, in _send_output
    self.send(msg)
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 980, in send
    self.connect()
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1454, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
socket.timeout: _ssl.c:1112: The handshake operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/camico/.local/share/virtualenvs/geocode-sqlite-FISN9xZW-python/lib/python3.9/site-packages/geopy/adapters.py", line 297, in get_text
    page = self.urlopen(req, timeout=timeout)
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 517, in open
    response = self._open(req, data)
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 534, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 1389, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 1349, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error _ssl.c:1112: The handshake operation timed out>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/camico/.local/share/virtualenvs/geocode-sqlite-FISN9xZW-python/lib/python3.9/site-packages/geopy/extra/rate_limiter.py", line 136, in _retries_gen
    yield i  # Run the function.
  File "/Users/camico/.local/share/virtualenvs/geocode-sqlite-FISN9xZW-python/lib/python3.9/site-packages/geopy/extra/rate_limiter.py", line 274, in __call__
    res = self.func(*args, **kwargs)
  File "/Users/camico/.local/share/virtualenvs/geocode-sqlite-FISN9xZW-python/lib/python3.9/site-packages/geopy/geocoders/google.py", line 270, in geocode
    return self._call_geocoder(url, callback, timeout=timeout)
  File "/Users/camico/.local/share/virtualenvs/geocode-sqlite-FISN9xZW-python/lib/python3.9/site-packages/geopy/geocoders/base.py", line 368, in _call_geocoder
    result = self.adapter.get_json(url, timeout=timeout, headers=req_headers)
  File "/Users/camico/.local/share/virtualenvs/geocode-sqlite-FISN9xZW-python/lib/python3.9/site-packages/geopy/adapters.py", line 286, in get_json
    text = self.get_text(url, timeout=timeout, headers=headers)
  File "/Users/camico/.local/share/virtualenvs/geocode-sqlite-FISN9xZW-python/lib/python3.9/site-packages/geopy/adapters.py", line 315, in get_text
    raise GeocoderTimedOut("Service timed out")
geopy.exc.GeocoderTimedOut: Service timed out
eyeseast commented 1 year ago

Well, maybe there's an easier solution here: Installing requests causes GeoPy to use the RequestsAdapter instead of urllib, and all the errors above go away.

It's optional for GeoPy, but adapters are sort of buried in the docs and it immediately makes things work better. That makes me think I should just add requests as a dependency here and encode a best practice.

eyeseast commented 1 year ago

Probably also worth increasing the default timeout from one second to five (or maybe higher). One of my goals with this whole library is to be able to run this in the background and trust that it'll finish eventually, or pickup where I left off if needed.