From time to time "Connection reset by peer" error while sending event to the server

sliwinski-milosz commented 2 years ago

Environment

Version

No response

Steps to Reproduce

I tried to debug this issue however I am not 100% sure what is the root cause. I think the problem is related to: https://github.com/urllib3/urllib3/issues/944#issuecomment-237471302 I have applied following workaround and it fixed the problem: https://github.com/psf/requests/issues/4937#issuecomment-788899804

It seems that for the Python client, the keep alive option is disabled by default. Maybe this is causing connection problems. Even that this looks like a Python client issue, and that the workaround on the client side worked fine - I wonder if there is a way to fix this on the server side. This is why I am opening the issue here.

Maybe changing the idle connection timeout on the server side would help. I would appreciate your expertise on that part.

It is not easy to reproduce this error. I tried to reproduce it from local machine by sending events every x seconds (1, 10, 60, 300, 1800). I have reproduced it only twice using 1800 secs delay between two events. However often, even for 1800 it was working fine. After enabling DEBUG logs on client side, I have realised that this issue happens few times a day for my service. I am happy to gather some more logs if needed.

Below I am sending full Python traceback logs for the issue:

[2021-09-01 15:34:10,901] [] [] [DEBUG] [] Sending event, type:null level:error event_id:xxx project:xxx host:xxx.ingest.sentry.io
[sentry] ERROR: Internal error in sentry_sdk
Traceback (most recent call last):
File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
File "/usr/app/venv/lib/python3.8/site-packages/newrelic/hooks/external_urllib3.py", line 32, in _nr_wrapper_make_request_
    return wrapped(*args, **kwargs)
File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
File "/usr/app/venv/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 102, in getresponse
    rv = real_getresponse(self, *args, **kwargs)
File "/usr/app/venv/lib/python3.8/site-packages/newrelic/hooks/external_httplib.py", line 77, in httplib_getresponse_wrapper
    return wrapped(*args, **kwargs)
File "/usr/lib/python3.8/http/client.py", line 1322, in getresponse
    response.begin()
File "/usr/lib/python3.8/http/client.py", line 303, in begin
    version, status, reason = self._read_status()
File "/usr/lib/python3.8/http/client.py", line 264, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
File "/usr/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/app/venv/lib/python3.8/site-packages/sentry_sdk/transport.py", line 339, in send_event_wrapper
    self._send_event(event)
File "/usr/app/venv/lib/python3.8/site-packages/sentry_sdk/transport.py", line 245, in _send_event
    self._send_request(
File "/usr/app/venv/lib/python3.8/site-packages/sentry_sdk/transport.py", line 179, in _send_request
    response = self._pool.request(
File "/usr/app/venv/lib/python3.8/site-packages/urllib3/request.py", line 78, in request
    return self.request_encode_body(
File "/usr/app/venv/lib/python3.8/site-packages/urllib3/request.py", line 170, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
File "/usr/app/venv/lib/python3.8/site-packages/urllib3/poolmanager.py", line 375, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
File "/usr/app/venv/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
File "/usr/app/venv/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
File "/usr/app/venv/lib/python3.8/site-packages/newrelic/hooks/external_urllib3.py", line 32, in _nr_wrapper_make_request_
    return wrapped(*args, **kwargs)
File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
File "/usr/app/venv/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 102, in getresponse
    rv = real_getresponse(self, *args, **kwargs)
File "/usr/app/venv/lib/python3.8/site-packages/newrelic/hooks/external_httplib.py", line 77, in httplib_getresponse_wrapper
    return wrapped(*args, **kwargs)
File "/usr/lib/python3.8/http/client.py", line 1322, in getresponse
    response.begin()
File "/usr/lib/python3.8/http/client.py", line 303, in begin
    version, status, reason = self._read_status()
File "/usr/lib/python3.8/http/client.py", line 264, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
File "/usr/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
[2021-09-01 15:34:10,903] [] [] [ERROR] [] Internal error in sentry_sdk Traceback (most recent call last):   File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen     httplib_response = self._make_request(   File "/usr/app/venv/lib/python3.8/site-packages/newrelic/hooks/external_urllib3.py", line 32, in _nr_wrapper_make_request_     return wrapped(*args, **kwargs)   File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request     six.raise_from(e, None)   File "<string>", line 3, in raise_from   File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request     httplib_response = conn.getresponse()   File "/usr/app/venv/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 102, in getresponse     rv = real_getresponse(self, *args, **kwargs)   File "/usr/app/venv/lib/python3.8/site-packages/newrelic/hooks/external_httplib.py", line 77, in httplib_getresponse_wrapper     return wrapped(*args, **kwargs)   File "/usr/lib/python3.8/http/client.py", line 1322, in getresponse     response.begin()   File "/usr/lib/python3.8/http/client.py", line 303, in begin     version, status, reason = self._read_status()   File "/usr/lib/python3.8/http/client.py", line 264, in _read_status     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")   File "/usr/lib/python3.8/socket.py", line 669, in readinto     return self._sock.recv_into(b)   File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into     return self.read(nbytes, buffer)   File "/usr/lib/python3.8/ssl.py", line 1099, in read     return self._sslobj.read(len, buffer) ConnectionResetError: [Errno 104] Connection reset by peer  During handling of the above exception, another exception occurred:  Traceback (most recent call last):   File "/usr/app/venv/lib/python3.8/site-packages/sentry_sdk/transport.py", line 339, in send_event_wrapper     self._send_event(event)   File "/usr/app/venv/lib/python3.8/site-packages/sentry_sdk/transport.py", line 245, in _send_event     self._send_request(   File "/usr/app/venv/lib/python3.8/site-packages/sentry_sdk/transport.py", line 179, in _send_request     response = self._pool.request(   File "/usr/app/venv/lib/python3.8/site-packages/urllib3/request.py", line 78, in request     return self.request_encode_body(   File "/usr/app/venv/lib/python3.8/site-packages/urllib3/request.py", line 170, in request_encode_body     return self.urlopen(method, url, **extra_kw)   File "/usr/app/venv/lib/python3.8/site-packages/urllib3/poolmanager.py", line 375, in urlopen     response = conn.urlopen(method, u.request_uri, **kw)   File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen     retries = retries.increment(   File "/usr/app/venv/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment     raise six.reraise(type(error), error, _stacktrace)   File "/usr/app/venv/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise     raise value.with_traceback(tb)   File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen     httplib_response = self._make_request(   File "/usr/app/venv/lib/python3.8/site-packages/newrelic/hooks/external_urllib3.py", line 32, in _nr_wrapper_make_request_     return wrapped(*args, **kwargs)   File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request     six.raise_from(e, None)   File "<string>", line 3, in raise_from   File "/usr/app/venv/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request     httplib_response = conn.getresponse()   File "/usr/app/venv/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 102, in getresponse     rv = real_getresponse(self, *args, **kwargs)   File "/usr/app/venv/lib/python3.8/site-packages/newrelic/hooks/external_httplib.py", line 77, in httplib_getresponse_wrapper     return wrapped(*args, **kwargs)   File "/usr/lib/python3.8/http/client.py", line 1322, in getresponse     response.begin()   File "/usr/lib/python3.8/http/client.py", line 303, in begin     version, status, reason = self._read_status()   File "/usr/lib/python3.8/http/client.py", line 264, in _read_status     line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")   File "/usr/lib/python3.8/socket.py", line 669, in readinto     return self._sock.recv_into(b)   File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into     return self.read(nbytes, buffer)   File "/usr/lib/python3.8/ssl.py", line 1099, in read     return self._sslobj.read(len, buffer) urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

Dependencies versions on client side:

ubuntu:18.04 - Python 3.8.0
sentry-sdk==1.3.1
urllib3==1.26.6

Expected Result

Client is able to successfully send all events to the server.

Actual Result

Due to connectivity problems, some events fail on the client side.

getsentry-release commented 2 years ago

Routing to @getsentry/owners-ingest for triage. ⏲️

untitaker commented 2 years ago

I believe this is a longer-standing issue with the Python SDK, but the web platform folks may know if we have similar issues in other SDKs (in that case ingest-team should investigate it)

github-actions[bot] commented 2 years ago

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

sliwinski-milosz commented 2 years ago

I think this issue is a major one and has to be fixed in the near future.

sliwinski-milosz commented 2 years ago

Are there any updates related to this specific issue? The idea shared here was to expose socket_options.

@sl0thentr0py could you please tell me if that is still preferable way to approach this issue or maybe you have some other ideas? :)

sl0thentr0py commented 11 months ago

closing in favor of https://github.com/getsentry/sentry-python/issues/2391

sl0thentr0py commented 10 months ago

we encountered similar issues while trying out the new metrics stuff in prod so i'm reopening

exception: "Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 128, in getresponse
    rv = real_getresponse(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
    response.begin()
  File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.8/http/client.py", line 277, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sentry_sdk/transport.py", line 532, in send_envelope_wrapper
    self._send_envelope(envelope)
  File "/usr/local/lib/python3.8/site-packages/sentry_sdk/transport.py", line 431, in _send_envelope
    self._send_request(
  File "/usr/src/sentry/src/sentry/utils/sdk.py", line 229, in patched_send_request
    return _send_request(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/sentry_sdk/transport.py", line 242, in _send_request
    response = self._pool.request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/request.py", line 78, in request
    return self.request_encode_body(
  File "/usr/local/lib/python3.8/site-packages/urllib3/request.py", line 170, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File "/usr/local/lib/python3.8/site-packages/urllib3/poolmanager.py", line 376, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.8/site-packages/urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.8/site-packages/sentry_sdk/integrations/stdlib.py", line 128, in getresponse
    rv = real_getresponse(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
    response.begin()
  File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.8/http/client.py", line 277, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))"

sl0thentr0py commented 10 months ago

ok so first thing, our setting num_pools to 2 in the PoolManager is entirely useless because the pool is per host and for a single DSN we will always use the same connection pool.

(venv) ➜  vanilla git:(main) ✗ python app.py
using conn HTTPSConnectionPool(host='o447951.ingest.sentry.io', port=443)
using conn HTTPSConnectionPool(host='o447951.ingest.sentry.io', port=443)
using conn HTTPSConnectionPool(host='o447951.ingest.sentry.io', port=443)
using conn HTTPSConnectionPool(host='o447951.ingest.sentry.io', port=443)
using conn HTTPSConnectionPool(host='o447951.ingest.sentry.io', port=443)
using conn HTTPSConnectionPool(host='o447951.ingest.sentry.io', port=443)
using conn HTTPSConnectionPool(host='o447951.ingest.sentry.io', port=443)
using conn HTTPSConnectionPool(host='o447951.ingest.sentry.io', port=443)
using conn HTTPSConnectionPool(host='o447951.ingest.sentry.io', port=443)
using conn HTTPSConnectionPool(host='o447951.ingest.sentry.io', port=443)

sl0thentr0py commented 10 months ago

ok there’s no keepalive but it is reusing the same connection object
bumping maxsize will not help since the worker only picks up one request at a time so we're always basically using the same connection from the same connection pool

Starting new HTTPS connection (1): o447951.ingest.sentry.io:443
[https://o447951.ingest.sentry.io:443](https://o447951.ingest.sentry.io/) "POST /api/5461230/envelope/ HTTP/1.1" 200 41
put back conn: <urllib3.connection.HTTPSConnection object at 0x105f77220>
[https://o447951.ingest.sentry.io:443](https://o447951.ingest.sentry.io/) "POST /api/5461230/envelope/ HTTP/1.1" 200 41
put back conn: <urllib3.connection.HTTPSConnection object at 0x105f77220>
[https://o447951.ingest.sentry.io:443](https://o447951.ingest.sentry.io/) "POST /api/5461230/envelope/ HTTP/1.1" 200 41
put back conn: <urllib3.connection.HTTPSConnection object at 0x105f77220>
[https://o447951.ingest.sentry.io:443](https://o447951.ingest.sentry.io/) "POST /api/5461230/envelope/ HTTP/1.1" 200 41
put back conn: <urllib3.connection.HTTPSConnection object at 0x105f77220>
[https://o447951.ingest.sentry.io:443](https://o447951.ingest.sentry.io/) "POST /api/5461230/envelope/ HTTP/1.1" 200 41
put back conn: <urllib3.connection.HTTPSConnection object at 0x105f77220>
[https://o447951.ingest.sentry.io:443](https://o447951.ingest.sentry.io/) "POST /api/5461230/envelope/ HTTP/1.1" 200 41
put back conn: <urllib3.connection.HTTPSConnection object at 0x105f77220>
[https://o447951.ingest.sentry.io:443](https://o447951.ingest.sentry.io/) "POST /api/5461230/envelope/ HTTP/1.1" 200 41
put back conn: <urllib3.connection.HTTPSConnection object at 0x105f77220>
[https://o447951.ingest.sentry.io:443](https://o447951.ingest.sentry.io/) "POST /api/5461230/envelope/ HTTP/1.1" 200 41
put back conn: <urllib3.connection.HTTPSConnection object at 0x105f77220>
[https://o447951.ingest.sentry.io:443](https://o447951.ingest.sentry.io/) "POST /api/5461230/envelope/ HTTP/1.1" 200 41
put back conn: <urllib3.connection.HTTPSConnection object at 0x105f77220>
[https://o447951.ingest.sentry.io:443](https://o447951.ingest.sentry.io/) "POST /api/5461230/envelope/ HTTP/1.1" 200 41
put back conn: <urllib3.connection.HTTPSConnection object at 0x105f77220>

sl0thentr0py commented 10 months ago

investigation so far

the underlying socket used by urllib3 is the same across requests
there is some timing issue - either timeouts on the load balancer or something else where this doesn’t clean up gracefully

possible action items

enable experimental SO_KEEPALIVE for the socket as per this and try out in prod
somehow tell urllib3 to not use the same socket too long - but idk how to do this without patching really low level code

sl0thentr0py commented 10 months ago

I tried the scenario

do some stuff, wait 15 min (more than LB timeout), do some stuff again

and in this case, the connection RESETS GRACEFULLY as seen in the logs

https://o447951.ingest.sentry.io:443 "POST /api/5461230/envelope/ HTTP/1.1" 200 0
 [sentry] DEBUG: Sending envelope [envelope with 2 items (error, attachment)] project:5461230 host:o447951.ingest.sentry.io
https://o447951.ingest.sentry.io:443 "POST /api/5461230/envelope/ HTTP/1.1" 200 0
 [sentry] DEBUG: Flushing HTTP transport
 [sentry] DEBUG: background worker got flush request
 [sentry] DEBUG: Sending envelope [envelope with 2 items (error, attachment)] project:5461230 host:o447951.ingest.sentry.io
Resetting dropped connection: o447951.ingest.sentry.io.  <--------------------
 [sentry] DEBUG: 11 event(s) pending on flush
https://o447951.ingest.sentry.io:443 "POST /api/5461230/envelope/ HTTP/1.1" 200 0

sliwinski-milosz commented 10 months ago

Hello!

It is great to see that you are looking into this issue!

This is the workaround that I am currently using:

import socket
from sentry_sdk import transport
from sentry_sdk.transport import HttpTransport
from urllib3.connection import HTTPConnection

class KeepAliveHttpTransport(HttpTransport):
    def _get_pool_options(self, ca_certs):
        opts = super()._get_pool_options(ca_certs)
        opts['socket_options'] = HTTPConnection.default_socket_options + [
            (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
            (socket.SOL_TCP, socket.TCP_KEEPIDLE, 45),
            (socket.SOL_TCP, socket.TCP_KEEPINTVL, 10),
            (socket.SOL_TCP, socket.TCP_KEEPCNT, 6),
        ]
        return opts

def fix_connection_resets_on_sentry():
    transport.HttpTransport = KeepAliveHttpTransport

So basically I am injecting the socket_options into the sentry internals. Of course I discourage of using that approach as it relies on sentry internals and may stop to work at any time.

Yet I am sending it to let you know that with above workaround I reached the acceptable reliability level - meaning that so far I have not found any missing event. However note that I disabled the sentry debug logs already and I am not looking very closely into that anymore. Which means there could be some that I missed - I guess I will know once I will start to use those new metrics that you mentioned about.

Long story short - there is a chance that updating the socket_options will improve the reliability significantly 👍

But there is also a bad news. I have been investigating the RemoteDisconnected errors for the calls between the Python services. I have not monitored the same for the calls between services and sentry server - but I guess it is affected as well.

When we use requests Session object for handling calls, above RemoteDisconnected error will happen from time to time and that cannot be fixed. I know this project uses urllib3 for calls but if it uses persisted connections it is affected as well (correct me if I am wrong).

From my observation this problem happens once every 200k-300k calls -> which does not make it a major issue. Yet still I am letting you know so you are aware of the problem. It is currently impossible to achieve high level of reliability while using persisted connections.

As RFC say :

A client, server, or proxy MAY close the transport connection at any time. For example, a client might have started to send a new request at the same time that the server has decided to close the “idle” connection. From the server’s point of view, the connection is being closed while it was idle, but from the client’s point of view, a request is in progress. This means that clients, servers, and proxies MUST be able to recover from asynchronous close events. Client software SHOULD reopen the transport connection and retransmit the aborted sequence of requests without user interaction so long as the request sequence is idempotent (see section 9.1.2). Non-idempotent methods or sequences MUST NOT be automatically retried, although user agents MAY offer a human operator the choice of retrying the request(s). Confirmation by user-agent software with semantic understanding of the application MAY substitute for user confirmation. The automatic retry SHOULD NOT be repeated if the second sequence of requests fails.

There are 2 ways to mitigate the problem:

Stop using requests sessions - this has performance implications - we need to pick, either being faster and less reliable or being slower and more reliable.
Retry idempotent requests (GET) - this brings some complexity and mitigates only idempotent requests

More details: Explanation Why it cannot be retried for idempotent requests

sl0thentr0py commented 10 months ago

ty @sliwinski-milosz for the detailed explanation! Most of what you said is consistent with my research so far. Retries are not an option for us as well. We will increase throughput through the SDK with the new metrics offering so we need to make sure there is a certain level of reliability.

sentrivana commented 6 months ago

Hey folks, we've just released SDK version 1.41.0 which includes support for setting socket options directly via an init option without the need for a custom transport, so the workaround above can now be written as:

import socket
from urllib3.connection import HTTPConnection
import sentry_sdk

sentry_sdk.init(
    # ...your usual settings...
    socket_options=HTTPConnection.default_socket_options + [
        (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
        # note: skip the following line if you're on MacOS since TCP_KEEPIDLE doesn't exist there
        (socket.SOL_TCP, socket.TCP_KEEPIDLE, 45),
        (socket.SOL_TCP, socket.TCP_KEEPINTVL, 10),
        (socket.SOL_TCP, socket.TCP_KEEPCNT, 6),
    ],
)

getsentry / sentry-python