getsentry / sentry-python

The official Python SDK for Sentry.io
https://sentry.io/for/python/
MIT License
1.86k stars 486 forks source link

Internal error in sentry_sdk #2732

Open manuellazzari-cargoone opened 7 months ago

manuellazzari-cargoone commented 7 months ago

Environment

SaaS (https://sentry.io/)

Steps to Reproduce

I am randomly getting the following error from some of my Python services. The SDK seems it is sending events correctly to sentry.io. No clue if it is skipping some of them. The error seems to occur more with load.

The service is mainly a Flask Python 3.11 g, run by guincorn on a kubernetes cluster in multiple instances on Google Cloud. All dependencies are basically up to date, and I'm using sentry-sdk-1.40.2 and urllib3-2.0.7.

In the attempt of fixing it, I tried to customize as follows the HttpTransport used by Sentry SDK, with no luck.

try:
    logger.info('Initializing Sentry SDK...')
    sentry_sdk.init(
        ...,
        environment=...,
        traces_sample_rate=...,
        integrations=[FlaskIntegration(), RedisIntegration()],
    )

    import socket
    from sentry_sdk import transport
    from sentry_sdk.transport import HttpTransport
    from urllib3.connection import HTTPConnection

    class KeepAliveHttpTransport(HttpTransport):
        def _get_pool_options(self, ca_certs):
            opts = super()._get_pool_options(ca_certs)
            opts['socket_options'] = HTTPConnection.default_socket_options + [
                (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
                (socket.SOL_TCP, socket.TCP_KEEPIDLE, 45),
                (socket.SOL_TCP, socket.TCP_KEEPINTVL, 10),
                (socket.SOL_TCP, socket.TCP_KEEPCNT, 6),
            ]
            return opts

    logger.info('Forcing keep alive for Sentry SDK HTTP transport...')
    transport.HttpTransport = KeepAliveHttpTransport

except Exception as e:
    logger.info(f'Unable to Start Sentry due to error: {e}')

Expected Result

No connection errors

Actual Result

Internal error in sentry_sdk
Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/urllib3/connectionpool.py", line 793, in urlopen response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/connectionpool.py", line 537, in _make_request
response = conn.getresponse()
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/connection.py", line 466, in getresponse
httplib_response = super().getresponse()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
return real_getresponse(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/http/client.py", line 1386, in getresponse
response.begin()
File "/usr/lib/python3.11/http/client.py", line 325, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/http/client.py", line 294, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
{}
During handling of the above exception, another exception occurred:
{}
Traceback (most recent call last): File "/usr/local/lib/python3.11/dist-packages/sentry_sdk/transport.py", line 535, in send_envelope_wrapper self._send_envelope(envelope) File "/usr/local/lib/python3.11/dist-packages/sentry_sdk/transport.py", line 434, in _send_envelope self._send_request( File "/usr/local/lib/python3.11/dist-packages/sentry_sdk/transport.py", line 245, in _send_request response = self._pool.request(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/_request_methods.py", line 144, in request
return self.request_encode_body(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/_request_methods.py", line 279, in request_encode_body
return self.urlopen(method, url, **extra_kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/poolmanager.py", line 444, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/connectionpool.py", line 847, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/util/retry.py", line 470, in increment
raise reraise(type(error), error, _stacktrace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/util/util.py", line 38, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.11/dist-packages/urllib3/connectionpool.py", line 793, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/connectionpool.py", line 537, in _make_request
response = conn.getresponse()
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/urllib3/connection.py", line 466, in getresponse
httplib_response = super().getresponse()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/sentry_sdk/integrations/stdlib.py", line 126, in getresponse
return real_getresponse(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/http/client.py", line 1386, in getresponse
response.begin()
File "/usr/lib/python3.11/http/client.py", line 325, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/http/client.py", line 294, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Product Area

Issues

Link

No response

DSN

https://1e425c4937e14585ab35335aa4810004@o155318.ingest.sentry.io/4503976860385280

Version

No response

getsantry[bot] commented 7 months ago

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] commented 7 months ago

Routing to @getsentry/product-owners-issues for triage ⏲️

vartec commented 7 months ago

Likely to be caused by https://github.com/getsentry/sentry-python/issues/2386

manuellazzari-cargoone commented 7 months ago

Likely to be caused by #2386

@vartec how is that connected? My app does not crash, only the Sentry SDK is complaining...

kieran-sf commented 7 months ago

I am getting the same issue as above, but I haven't yet tried the custom HttpTransport. Basic specs are: sentry-sdk[flask,django,celery]==1.29.0 python3.9

sentrivana commented 7 months ago

Hey @manuellazzari-cargoone and @kieran-sf! Thanks for reporting this.

My first suggestion would've been exactly what you tried @manuellazzari-cargoone with the custom socket options from https://github.com/getsentry/sentry-python/issues/1198#issuecomment-1802638463. Are you seeing at least some improvement?

I'm curious whether spacing out the events sent from the SDK makes a difference. Can you try overriding _send_request in your custom transport roughly like this?

import time

class KeepAliveHttpTransport(HttpTransport):
    def _send_request(self, *args, **kwargs):
        time.sleep(0.01)
        super()._send_request(self, *args, **kwargs)
manuellazzari-cargoone commented 7 months ago

@sentrivana thanks for getting back -- when trying the custom socket settings I wasn't able to see any major differences. Now I'm testing the whole package (custom socket settings and custom send request).

manuellazzari-cargoone commented 7 months ago

@sentrivana the issue is still there even by adding the custom _send_request. From a qualitative perspective, it seems that it appears more or less with the same frequency (see below an extraction from our logs before and after the changes were deployed).

2024-02-16T15:29:33.048565197Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:30:55.565954424Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:33:13.537149008Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:33:17.914818994Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:34:47.778557487Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:34:47.819326396Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:37:33.774480759Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:37:34.586534890Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:41:53.741483570Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:45:11.560933696Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:51:10.940568800Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:51:11.016053254Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:51:20.146324668Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:51:28.814144310Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:58:53.136957655Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:58:54.053616565Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T15:59:16.002942751Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T16:03:12.954825518Z Initializing Sentry SDK... <-- changes deployed
...
2024-02-16T16:13:44.713686531Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T16:22:30.285920011Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T16:25:48.997920085Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T16:25:49.389493362Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T16:26:00.889221302Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
2024-02-16T16:38:44.314760846Z ERROR - sentry_sdk.errors - Internal error in sentry_sdk
sentrivana commented 7 months ago

@manuellazzari-cargoone Thanks for the follow up. It looks like the sleep might've had a tiny effect, at least from the logs it looks like there's less occurrences if you look at a comparable time span before and after -- but this could obviously have to do with traffic etc., so don't think this is exactly conclusive.

Are you seeing any network errors for outgoing requests anywhere else in your system? Just trying to rule out general network instability. Alternatively I'm thinking if there's anything special about the errors/transactions -- maybe them being unusually big so it takes long to send each one and the server drops the connection? Fiddling around with the socket options might make a difference, too.

manuellazzari-cargoone commented 7 months ago

Are you seeing any network errors for outgoing requests anywhere else in your system? Just trying to rule out general network instability.

I'm observing this issue just with 2 services both running with gunicorn. Also it's all GCP, so I would rule out consistent network instability.

Alternatively I'm thinking if there's anything special about the errors/transactions -- maybe them being unusually big so it takes long to send each one and the server drops the connection?

I'm not sure about size, but I know for sure we have services deal with a lot more traffic and a lot bigger packages. In particular, one of the affected services is dealing with very little traffic and small packages and still experiencing the problem.

sentrivana commented 6 months ago

Thanks! I think in general we need to reconsider our transport logic and possibly add retries for more cases. (Currently we only retry if we're rate limited.) But at the same time, with this particular error there's no telling if the server actually processed the event and just crapped out at the very end, so it's not clear cut.