SDK causes significant performance issue.

PaulRudin commented 1 year ago

How do you use Sentry?

Sentry Saas (sentry.io)

Version

1.22.2

Steps to Reproduce

Adding the sentry sdk and setting the trace sample rate and profile sample rate at 0.1 in my starlette web app has had a pretty big effect on performance. This app is essentially cpu bound - once the app can't create responses as fast as they're coming in then things go pear shaped. W/o the sentry sdk this seems to happen at around 2000 requests per second, whereas with it at around 1000 request per second.

It seems like even with the sampling rates at zero, there's a cost. Just initializing the sdk means that my 50th centile response time goes from ~2ms to ~40ms, and 95th from ~4ms to ~80ms

Here's the code used to intialize the sdk:

if settings.enable_sentry:
    import sentry_sdk

    sentry_sdk.init(
        dsn=settings.sentry_dsn,
        traces_sample_rate=settings.sentry_traces_sample_rate,
        profiles_sample_rate=settings.sentry_profiles_sample_rate,
        environment=settings.sentry_environment,
    )

Here's requirements.txt:

#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
#    pip-compile requirements.in
#
anyio==3.6.2
    # via
    #   starlette
    #   watchfiles
argcomplete==3.0.5
    # via vineyard
async-timeout==4.0.2
    # via redis
certifi==2023.5.7
    # via sentry-sdk
click==8.1.3
    # via
    #   uvicorn
    #   vineyard
etcd-distro==3.5.1
    # via vineyard
h11==0.14.0
    # via uvicorn
hiredis==2.2.2
    # via redis
httptools==0.5.0
    # via uvicorn
idna==3.4
    # via anyio
makefun==1.15.1
    # via vineyard
numpy==1.24.2
    # via
    #   -r requirements.in
    #   pandas
    #   pyarrow
    #   vineyard
orjson==3.8.10
    # via -r requirements.in
pandas==2.0.0
    # via vineyard
psutil==5.9.5
    # via vineyard
pyarrow==11.0.0
    # via vineyard
pydantic[dotenv]==1.10.7
    # via -r requirements.in
python-dateutil==2.8.2
    # via pandas
python-dotenv==1.0.0
    # via
    #   pydantic
    #   uvicorn
pytz==2023.3
    # via pandas
pyyaml==6.0
    # via uvicorn
redis[hiredis]==4.5.4
    # via -r requirements.in
sentry-sdk[starlette]==1.22.2
    # via -r requirements.in
six==1.16.0
    # via
    #   python-dateutil
    #   treelib
sniffio==1.3.0
    # via anyio
sortedcontainers==2.4.0
    # via vineyard
starlette==0.26.1
    # via
    #   -r requirements.in
    #   sentry-sdk
treelib==1.6.4
    # via vineyard
typing-extensions==4.5.0
    # via pydantic
tzdata==2023.3
    # via pandas
urllib3==1.26.15
    # via sentry-sdk
uvicorn[standard]==0.21.1
    # via -r requirements.in
uvloop==0.17.0
    # via uvicorn
vineyard==0.14.0
    # via -r requirements.in
vineyard-bdist==0.14.0
    # via vineyard
watchfiles==0.19.0
    # via uvicorn
websockets==11.0.1
    # via uvicorn

Expected Result

The docs suggest that the performance cost should be significantly less. I suspect (but don't know) that there's a more or less constant time overhead for the sentry sdk, and because this app doesn't do a lot of work per request that's significant in percentage terms.

Actual Result

As described above. The performance results are obtained by load testing with locust.

Zylphrex commented 1 year ago

Hi @PaulRudin, thanks for opening the issue.

To clarify, when you say

It seems like even with the sampling rates at zero, there's a cost. Just initializing the sdk means that my 50th centile response time goes from ~2ms to ~40ms, and 95th from ~4ms to ~80ms

Are you referring to traces_sample_rate, or profiles_sample_rate or both?

And can you elaborate a little more n what you mean by

things go pear shaped

PaulRudin commented 1 year ago

I mean that if I set both sampling rates to zero, simply by initializing the sdk I see a change in performance from running without initializing the sdk.

By "things go pear shaped" I mean that once cpu utilization gets to 100% then response times very quickly deteriorate. This is presumably expected. It's a single threaded application. If it can't process requests as quickly as they're coming in then something has to give.

dvprz commented 1 year ago

We saw the same thing happen to our application when upgrading to 1.20.0. It has to do with this change https://github.com/getsentry/sentry-python/blob/master/CHANGELOG.md#various-fixes--improvements-3

Note: This will add spans for all requests to the caches configured in Django. This will probably add some overhead to your server an also add multiple spans to your performance waterfall diagrams. If you do not want this, you can disable this feature in the DjangoIntegration:

We saw one of our APIs increase 300% in p50 latency. Reverting to the version prior resolved it.

sl0thentr0py commented 1 year ago

hey @dvprz are you sure the problem is with the caching patch or are you assuming it's that based on the changelog entry? also do you mean 1.20.0 or 1.22.0?

dvprz commented 1 year ago

@sl0thentr0py Hey, yes we are assuming its related to the change I highlighted and yes sorry I meant 1.22.0.

We noticed a drastic uptick in latency for a few key endpoints that utilize the cache (redis) starting on May 8th. After looking through what we deployed we noticed an update to sentry-sdk from 1.21.1 to 1.22.1. After looking at the change log the "add some overhead to your server" comment stood out and so we reverted this upgrade yesterday and the endpoint normalized. Here is a screenshot of the endpoint's latency

This is our sentry init if it helps:

sentry.init(
        before_send=event_filter,
        dsn=settings.SENTRY_DSN if get_environment() in ['production', 'staging'] else '',
        environment=get_environment(),
        integrations=[CeleryIntegration(), DjangoIntegration(), RedisIntegration(), SqlalchemyIntegration()],
        release=get_release(),
        send_default_pii=True,
    )

antonpirker commented 1 year ago

Hey @dvprz you can disable the caching spans we introduced in 1.22.0 with the cache_spans option. This should fix your overhead. See https://docs.sentry.io/platforms/python/guides/django/#options

antonpirker commented 1 year ago

Hey @PaulRudin! Could you maybe create a demo project where we could reproduce this?

sl0thentr0py commented 1 year ago

I'm moving @dvprz's comment to a separate issue since the original post is about starlette

sl0thentr0py commented 1 year ago

@PaulRudin

It's a single threaded application

you're using uvicorn + asgi right? I'm not sure what you mean by single threaded here. Can you maybe paste how you start your app and with what config?

getsantry[bot] commented 11 months ago

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you remove the label Waiting for: Community, I will leave it alone ... forever!

"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

getsentry / sentry-python