Open zaksoup opened 4 years ago
Exponential backoff or some other retry mechanism? I've typically "handled" (major air quotes) these errors by making large requests multiple times over a given time period, but this is not ideal for anyone. The current escape sequence is ripe for improvement.
Some related conversation can be found in #13 and #82
As an aside, the current code using is not
only works for status codes < 256. I'm not a pythonista by any account so I spent a bit too long trying to figure out why
x = 200
x is 200
# true
y = 500
y is 500
# false
was happening. Turns out, is
checks for object equivalence and for ints < 256 python uses the same object, but above that they'll be different objects...
On topic... I wrote a very (very very) quick-and-dirty attempt at making the code a bit more retryable, including to Connection errors. Any feedback on what would be more idiomatic python is extremely welcome. This is in client.py
...
@staticmethod
def retryable_get(session, url, params):
r = Client._get(session, url, params)
wait_time = 1
retries = 1
while (r is None or should_retry(r.status_code)) and retries <= 12:
if r is None:
print(f"Connection Error, retrying")
else:
print(f"{r.status_code} received, sleeping #{wait_time} second")
pretty_sleep(wait_time)
r = Client._get(session, url, params)
wait_time = wait_time * 2
retries += 1
if r is None:
raise ConnectionError
return r
@staticmethod
def _get(session, url, params):
try:
r = session.get(url, params=params)
return r
except ConnectionError as e:
return None
What happens now?
Some providers have implementation issues with their MDS endpoints. It's common for standard requests to result in un-explained 500 errors that will disappear when retrying or for the remote end to disconnect mid-request.
What should happen?
I'd like to request that we investigate adding logic to retry requests on certain conditions, like the remote disconnecting mid-request or receiving a 500 error
How do we do that?
I'm opening this issue to discuss what the recommended course of action would be to improve the resilience of the client library.