CityofSantaMonica / mds-provider

Python tools for working with MDS Provider data
https://github.com/openmobilityfoundation/mobility-data-specification
MIT License
18 stars 20 forks source link

Client resiliance to provider server errors #88

Open zaksoup opened 4 years ago

zaksoup commented 4 years ago

What happens now?

Some providers have implementation issues with their MDS endpoints. It's common for standard requests to result in un-explained 500 errors that will disappear when retrying or for the remote end to disconnect mid-request.

What should happen?

I'd like to request that we investigate adding logic to retry requests on certain conditions, like the remote disconnecting mid-request or receiving a 500 error

How do we do that?

I'm opening this issue to discuss what the recommended course of action would be to improve the resilience of the client library.

thekaveman commented 4 years ago

Exponential backoff or some other retry mechanism? I've typically "handled" (major air quotes) these errors by making large requests multiple times over a given time period, but this is not ideal for anyone. The current escape sequence is ripe for improvement.

Some related conversation can be found in #13 and #82

zaksoup commented 4 years ago

As an aside, the current code using is not only works for status codes < 256. I'm not a pythonista by any account so I spent a bit too long trying to figure out why

x = 200
x is 200
# true
y = 500
y is 500
# false

was happening. Turns out, is checks for object equivalence and for ints < 256 python uses the same object, but above that they'll be different objects...

zaksoup commented 4 years ago

On topic... I wrote a very (very very) quick-and-dirty attempt at making the code a bit more retryable, including to Connection errors. Any feedback on what would be more idiomatic python is extremely welcome. This is in client.py...

    @staticmethod
    def retryable_get(session, url, params):
        r = Client._get(session, url, params)
        wait_time = 1
        retries = 1
        while (r is None or should_retry(r.status_code)) and retries <= 12:
            if r is None:
                print(f"Connection Error, retrying")
            else:
                print(f"{r.status_code} received, sleeping #{wait_time} second")

            pretty_sleep(wait_time)
            r = Client._get(session, url, params)
            wait_time = wait_time * 2
            retries += 1

        if r is None:
            raise ConnectionError
        return r

    @staticmethod
    def _get(session, url, params):
        try:
            r = session.get(url, params=params)
            return r
        except ConnectionError as e:
            return None