bugsnag / bugsnag-python

Official BugSnag error monitoring and error reporting for django, flask, tornado and other python apps.
https://docs.bugsnag.com/platforms/python/
MIT License
86 stars 42 forks source link

Improve error handling in RequestsDelivery #301

Closed svix-nolan closed 2 years ago

svix-nolan commented 2 years ago

Describe the bug

In RequestsDelivery, requests is used to make a call to sessions.bugsnag.com here. We are getting occasional ConnectionResetByPeer errors from the requests.post call when running in our AWS Lambda fn.

Although this is not directly a problem with bugsnag, it would be good to have retries to mitigate this. Also errors like this are not gracefully handled (since its not an http error) and escape the thread. See the error message below.

Steps to reproduce

Connection reset by peer errors occur every once in a while in our logs and are difficult to reproduce. What is probably more useful checking/simulating the other error cases from requests (like mock it to throw a ConnectionError).

Environment

Error Messages

Error messages: ``` Exception in thread Thread-16: Traceback (most recent call last): File "/var/task/urllib3/connectionpool.py", line 699, in urlopen httplib_response = self._make_request( File "/var/task/urllib3/connectionpool.py", line 445, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "/var/task/urllib3/connectionpool.py", line 440, in _make_request httplib_response = conn.getresponse() File "/var/lang/lib/python3.8/http/client.py", line 1348, in getresponse response.begin() File "/var/lang/lib/python3.8/http/client.py", line 316, in begin version, status, reason = self._read_status() File "/var/lang/lib/python3.8/http/client.py", line 277, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/var/lang/lib/python3.8/socket.py", line 669, in readinto return self._sock.recv_into(b) File "/var/lang/lib/python3.8/ssl.py", line 1241, in recv_into return self.read(nbytes, buffer) File "/var/lang/lib/python3.8/ssl.py", line 1099, in read return self._sslobj.read(len, buffer) ConnectionResetError: [Errno 104] Connection reset by peer During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/var/task/requests/adapters.py", line 439, in send resp = conn.urlopen( File "/var/task/urllib3/connectionpool.py", line 755, in urlopen retries = retries.increment( File "/var/task/urllib3/util/retry.py", line 532, in increment raise six.reraise(type(error), error, _stacktrace) File "/var/task/urllib3/packages/six.py", line 769, in reraise raise value.with_traceback(tb) File "/var/task/urllib3/connectionpool.py", line 699, in urlopen httplib_response = self._make_request( File "/var/task/urllib3/connectionpool.py", line 445, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "/var/task/urllib3/connectionpool.py", line 440, in _make_request httplib_response = conn.getresponse() File "/var/lang/lib/python3.8/http/client.py", line 1348, in getresponse response.begin() File "/var/lang/lib/python3.8/http/client.py", line 316, in begin version, status, reason = self._read_status() File "/var/lang/lib/python3.8/http/client.py", line 277, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/var/lang/lib/python3.8/socket.py", line 669, in readinto return self._sock.recv_into(b) File "/var/lang/lib/python3.8/ssl.py", line 1241, in recv_into return self.read(nbytes, buffer) File "/var/lang/lib/python3.8/ssl.py", line 1099, in read return self._sslobj.read(len, buffer) urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/var/lang/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/var/lang/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/var/task/bugsnag/delivery.py", line 145, in request response = requests.post(uri, **req_options) File "/var/task/requests/api.py", line 117, in post return request('post', url, data=data, json=json, **kwargs) File "/var/task/requests/api.py", line 61, in request return session.request(method=method, url=url, **kwargs) File "/var/task/requests/sessions.py", line 542, in request resp = self.send(prep, **send_kwargs) File "/var/task/requests/sessions.py", line 655, in send r = adapter.send(request, **kwargs) File "/var/task/requests/adapters.py", line 498, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) ```
luke-belton commented 2 years ago

Hi @svix-nolan - there was a similar issue recently over at https://github.com/bugsnag/bugsnag-python/issues/300 linked to non-HTTP errors during report delivery. We're going to investigate and look into a fix to improve error handling in these situations.

svix-nolan commented 2 years ago

I see pull request https://github.com/bugsnag/bugsnag-python/pull/303 was created to address #300 (and mask other exceptions for this issue - #301). It's been almost two weeks since it was created, any updates on it?

It would be worthwhile to add retries (maybe just one) to the delivery class to prevent transient errors like we've seen from interfering with exception reporting.

mattdyoung commented 2 years ago

We had a freeze on releases over the holiday period but this is being progressed again now, and https://github.com/bugsnag/bugsnag-python/pull/303 will resolve this when released.

Adding retries would be something we'd need to consider more widely across our supported server and scripting platforms for consistency so I've fed this back to our product team to consider.