Project: Backpressure Management

getsentry / team-sdks

A meta repository for tracking work across all SDK teams.

0 stars 0 forks source link

Project: Backpressure Management #10

Open smeubank opened 1 year ago

smeubank commented 1 year ago

Project Board

See our project board to track this initiative across all SDKs

Description

Problem Statement

In some high throughput situations, specifically, with generation and sending of transaction events, our Backend SDKs may have some negative overhead.

Situationally this can be something like the straw that broke the camels back, meaning the application is already in a bad state, but the SDK should ideally minimize it's impact in the situation.

As a driver, the concern is that we lose metrics data from sampling, so we would like to be able to collect as much data as possible from performance data for high fidelity metrics.

example change which caused this issue: https://github.com/getsentry/sentry/pull/45592/files

Solution Brainstorm

Internal Vanguard with explanation of imple: here

Develop spec:???

### Tasks
- [ ] https://github.com/getsentry/sentry-docs/issues/7324
- [ ] Develop Docs
- [ ] https://github.com/getsentry/sentry-python/issues/2095
- [ ] https://github.com/getsentry/sentry-php/issues/1563
- [ ] https://github.com/getsentry/sentry-java/issues/2829
- [ ] https://github.com/getsentry/sentry-ruby/issues/2063
- [ ] Golang
- [ ] https://github.com/getsentry/sentry-javascript/issues/8477
- [ ] https://github.com/getsentry/sentry-dotnet/issues/2946

sl0thentr0py commented 1 year ago

this is a rough outline / plan for python

[ ] simulate bad things / very heavy load and try to crash a server
- [ ] only errors
- [ ] errors + perf
- [ ] ignore profiler for now since it has its own complexity
[ ] what does crash mean?
- [ ] out of memory
- [x] CPU constantly running hot + queue backlog growing unboundedly - this will never clear if rate of growth faster
[ ] go through datastructures where we store most stuff
- [ ] span recorder
- [ ] breadcrumb buffer
- [x] background worker queue
- [ ] urllib3 poolmanager
[ ] hot code paths
- [ ] profile the SDK itself on a django request
- [ ] serializer
- [ ] frame extraction (try with too many vars)
- [ ] tracemalloc
- [x] measure average pool.request time
[ ] add limits or simple exponential backoff logic where necessary
- [ ] track stats / client reports when we start dropping stuff?
- [ ] system should recover gracefully from a bad state, make sure this happens somehow (unclear how exactly to implement)
[ ] see what SDK is doing even when everything is off, try to clean up ‘no-op’ scenario
- [ ] we should be as idle as possible
- [ ] turn off random start_spans that are always called

sl0thentr0py commented 1 year ago

related PR where user found UUID generation is slow https://github.com/getsentry/sentry-python/pull/2131

sl0thentr0py commented 1 year ago

measure background worker queue clearing times

In general, our outbound requests take around 30ms to clear. If the user app runs hot and the queue (capped at 100) is full, it will stay full and drop events till the background worker thread gets roughly 30*100 / 1000 = 3 seconds of CPU time allocated to clear this backlog.

We will record these drops with the queue_overflow Client Report for visibility.

Other than that, I think the background worker is safe because of the queue limit and should not cause stability issues. The queue will never grow in an unbounded manner.

methodology

all following experiments in this thread done on

Software:
    System Software Overview:
      System Version: macOS 13.2.1 (22D68)
      Kernel Version: Darwin 22.3.0
Hardware:
    Hardware Overview:
      Model Name: MacBook Pro
      Model Identifier: MacBookPro16,2
      Processor Name: Quad-Core Intel Core i7
      Processor Speed: 2,3 GHz
      Number of Processors: 1
      Total Number of Cores: 4
      L2 Cache (per Core): 512 KB
      L3 Cache: 8 MB
      Hyper-Threading Technology: Enabled
      Memory: 32 GB
      System Firmware Version: 1916.80.2.0.0 (iBridge: 20.16.3045.0.0,0)
      OS Loader Version: 564.40.4~66

run following with this patch

import sentry_sdk
from threading import Thread
from time import sleep

sentry_sdk.init()

def work():
    for i in range(500):
        try:
            1 / 0
        except Exception:
            sentry_sdk.capture_exception()

def size():
    return sentry_sdk.Hub.current.client.transport._worker._queue.qsize()

def measure():
    while size() > 0:
        sleep(0.1)
        print(f"queue size {size()}")

t1 = Thread(target=work)
t2 = Thread(target=measure)

t1.run()
t2.run()

times = sentry_sdk.Hub.current.client.transport._worker._times
print(f"min: {min(times)}, mean: {sum(times) / len(times)}, max: {max(times)}")

output

min: 0.027270078659057617, mean: 0.030650407617742365, max: 0.0953221321105957

sl0thentr0py commented 1 year ago

PR which caused s4s to blow up - https://github.com/getsentry/sentry/pull/44465
incident link - https://www.notion.so/sentry/INC-305-Large-increase-in-transaction-volume-broke-S4S-63b223db2db54136b67cf43e8b7b7a0d?pvs=4

mattjohnsonpint commented 1 year ago

Any thoughts to priority queue as a pattern here? For example, if we're handling backpressure for errors and transactions in the same queue (or circular buffer, etc.), then a flood of transactions could prevent errors from coming through. Priority queue, or separate queues, would solve that.

sl0thentr0py commented 1 year ago

great suggestion @mattjohnsonpint thanks! I think separate queues is iffy because it breaks the event agnostic nature of the transport, but we could certainly add weights / priorities when we submit events without changing much of the existing logic.

mattjohnsonpint commented 1 year ago

Some design thought needs to go into this, especially on the background worker. In the .NET SDK, we currently set a default maximum size of the background worker queue to 30 envelopes - and they're not prioritized. In a server environment, a flood of activity can fill up the buffer quickly with transaction envelopes. If an error occurs while the queue is full, the envelope is dropped. At minimum we should probably set a larger default for web SDKs, (ASP.NET Core for .NET, Django in python, etc.). But also we might think about separate limits for different envelope types (ie, 100 transactions, 50 errors, etc.). Either way, this also needs to be documented (In .NET it's MaxQueueItems, which appears absent from the user docs presently.)

Not sure about in Python, but in .NET the background worker operates independently of the transport. The transport itself doesn't manage the queue, but rather the client adds items to the worker's queue, which then uses a transport when dequeuing. Basically we insert a queue managed by a BackgroundWorker between the Client and the Transport seen in the unified API design.

So multiple concurrent queues would be possible as an implementation detail. But perhaps this can vary by SDK.

sl0thentr0py commented 1 year ago

@sentrivana @antonpirker a basic tracemalloc helper you can use to investigate.

import pprint
import contextlib
import tracemalloc
import sentry_sdk

@contextlib.contextmanager
def measure():
    filters = [tracemalloc.Filter(True, '**sentry_sdk**')]

    if not tracemalloc.is_tracing():
        tracemalloc.start()

    snapshot1 = tracemalloc.take_snapshot().filter_traces(filters)

    yield

    snapshot2 = tracemalloc.take_snapshot().filter_traces(filters)
    snapshot = snapshot2.compare_to(snapshot1, 'lineno')
    pprint.pprint(snapshot)

sentry_sdk.init()

with measure():
    try:
        1 / 0
    except Exception:
        sentry_sdk.capture_exception()

this gives the following which basically shows all the useless event allocation we do from the top level api to the transport check in client where we finally just return and drop the event.

[<StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/api.py' lineno=107>,)> size=840 (+840) count=2 (+2)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=1038>,)> size=840 (+840) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/hub.py' lineno=388>,)> size=792 (+792) count=4 (+4)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=215>,)> size=672 (+672) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/hub.py' lineno=338>,)> size=568 (+568) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=1033>,)> size=552 (+552) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=531>,)> size=552 (+552) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=913>,)> size=544 (+544) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=732>,)> size=520 (+520) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=479>,)> size=504 (+504) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=632>,)> size=496 (+496) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=731>,)> size=480 (+480) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=624>,)> size=480 (+480) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=463>,)> size=480 (+480) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=503>,)> size=464 (+464) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=722>,)> size=456 (+456) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=637>,)> size=456 (+456) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=502>,)> size=448 (+448) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=498>,)> size=448 (+448) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/hub.py' lineno=335>,)> size=448 (+448) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=720>,)> size=432 (+432) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=721>,)> size=424 (+424) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=694>,)> size=424 (+424) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=738>,)> size=416 (+416) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=1035>,)> size=232 (+232) count=2 (+2)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=688>,)> size=168 (+168) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=687>,)> size=168 (+168) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=623>,)> size=168 (+168) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=218>,)> size=168 (+168) count=1 (+1)>,
 <StatisticDiff traceback=<Traceback (<Frame filename='/Users/neel/sentry/sdks/sentry-python/sentry_sdk/utils.py' lineno=1037>,)> size=64 (+64) count=1 (+1)>]

cleptric commented 1 year ago

Some design thought needs to go into this, especially on the background worker. In the .NET SDK, we currently set a default maximum size of the background worker queue to 30 envelopes - and they're not prioritized.

Thirty envelopes seem way too low IMO. Go has a historic limit of 30 for error events, with an increase to 1000 if tracing is enabled https://github.com/getsentry/sentry-go/blob/master/client.go#L324-L326

mattjohnsonpint commented 1 year ago

@cleptric - love the todo comment just above that. 😅

sentrivana commented 1 year ago

Here's some data for the Python SDK.

Created a Django app with two views, one of which throws an exception. Hooked it up to the SDK @fe7e501.

# views.py
from django.http import HttpResponse

def ok(request):
    return HttpResponse("awesome~~")

def error(request):
    1 / 0
    return HttpResponse("not so awesome~~")

# urls.py
from django.urls import path

from myapp import views

urlpatterns = [
    path("ok/", views.ok),
    path("error/", views.error),
]

Made a script to send simple GET requests to an endpoint (needs requests):

import requests

SAMPLES_PER_ENDPOINT = 10**4
ENDPOINT = "ok"

for i in range(SAMPLES_PER_ENDPOINT):
    if (i + 1) % (SAMPLES_PER_ENDPOINT // 10) == 0:
        print(f"/{ENDPOINT}: {i + 1} / {SAMPLES_PER_ENDPOINT}")
    requests.get(f"http://127.0.0.1:8000/{ENDPOINT}")

Opted to profile three separate scenarios since there might be further insights to be gained from comparing these:

idling (tracing disabled, requests only to the /ok endpoint)
tracing (tracing enabled, requests only to the /ok endpoint)
tracing & creating errors (tracing enabled, requests only to the /error endpoint)

Additionally profiled the app both with and without a DSN to be able to look at pure event creation vs. serializing. (Added an early return just before the SDK would actually send a request outside.)

Memory allocation

I put this in __init__.py of the Django app:

import linecache
import signal
import sys
import tracemalloc

def display_top(snapshot, limit=25):
    # just a helper for including the actual line of code instead of just
    # filename + line number and making the output more human friendly
    print("Top %s lines" % limit)
    for index, stat in enumerate(snapshot[:limit], 1):
        frame = stat.traceback[0]
        print(
            "#%s: %s:%s: %.1f KiB"
            % (
                index,
                frame.filename,
                frame.lineno,
                stat.size / 1024,
            )
        )
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print("    %s" % line)

tracemalloc.start()
tracemalloc_filters = [tracemalloc.Filter(True, "**sentry_sdk**")]
start_snapshot = tracemalloc.take_snapshot().filter_traces(tracemalloc_filters)

def on_shutdown(*args):
    end_snapshot = tracemalloc.take_snapshot().filter_traces(tracemalloc_filters)
    comparison = end_snapshot.compare_to(start_snapshot, "lineno")
    display_top(comparison)
    sys.exit()

signal.signal(signal.SIGINT, on_shutdown)

This should give us an idea about memory allocation from startup to shutdown. Note that you need to run the app with ./manage.py runserver --noreload for the shutdown signal to work.

Here's the top 25 places with most memory allocated:

Without DSN
- idling (tracing disabled, requests only to the /ok endpoint)
- tracing (tracing enabled, requests only to the /ok endpoint)
- tracing & creating errors (tracing enabled, requests only to the /error endpoint)
With DSN
- idling (tracing disabled, requests only to the /ok endpoint)
- tracing (tracing enabled, requests only to the /ok endpoint)
- tracing & creating errors (tracing enabled, requests only to the /error endpoint)

Time

Installed py-spy and ran the Django app with:

sudo py-spy record -o profile.svg --subprocesses -- ./manage.py runserver

(sudo needed on OSX.)

When the server is stopped, this creates a flamegraph of the profiled process(es).

See the resulting flamegraphs for a comparison between the SDK idling vs. actively creating events. (If you open the raw image, there's a near invisible search button on the top right where you can search for "sentry" and it'll highlight SDK things in purple.)

Without DSN
- idling (tracing disabled, requests only to the /ok endpoint)
- tracing (tracing enabled, requests only to the /ok endpoint)
- tracing & creating errors (tracing enabled, requests only to the /error endpoint)
With DSN
- idling (tracing disabled, requests only to the /ok endpoint)
- tracing (tracing enabled, requests only to the /ok endpoint)
- tracing & creating errors (tracing enabled, requests only to the /error endpoint)

Sidenote

Managed to also get profile to work but this is obviously much slower. One way to go about it is to put something like this in manage.py (don't judge) and then run the server normally with ./manage.py runserver:

def main_with_profile():
    import threading
    import profile
    from collections import defaultdict

    lock = threading.Lock()
    profs = defaultdict(profile.Profile)

    def do_profile(frame, event, arg):
        tid = threading.current_thread().native_id
        with lock:
            prof = profs[tid]
        if event == "return" and frame is not prof.cur[-2]:
            return None
        return prof.dispatcher(frame, event, arg)

    sys.setprofile(do_profile)
    threading.setprofile(do_profile)

    main()

    sys.setprofile(None)
    threading.setprofile(None)

    for tid, prof in profs.items():
        prof.dump_stats(f"t-{tid}.dump")

if __name__ == "__main__":
    main_with_profile()

sl0thentr0py commented 1 year ago

updated plan

will make some kind of global Monitor with just the rate limiting feeding into the ‘bad state’ for now
the Monitor is the central source for tracking if good state / bad state
we can extend this with more logic later
this bad state gets picked up in hub.start_transaction and we halve the user’s sample rate in steps (in some time intervals? TBD) till the bad state goes away
do this with a separate 'downsampling factor' so it also applies to traces_sampler etc
we track that the sample rate was dynamically tweaked somewhere (TBD)
use this ^ flag to make a new client report / outcome called TBD
recovery will be automatic once bad state goes away (monitor’s responsiblilty)

sentrivana commented 1 year ago

Some more Django profiling data: Made a Django app with the views below and profiled the do view with django-cprofile-middleware. There's a handful of places where the view accesses an SQLite DB to either fetch or insert something, and where it makes an outgoing HTTP request, so we can use this to see where the SDK spends time doing transaction-related stuff (which will eventually be thrown away anyway because there's no DSN).

Setup

# views.py

def another(request):
    return HttpResponse([f"{d.a} {d.b} {d.c}" for d in Dummy.objects.all()])

def do(request):  # the endpoint being profiled
    dummy = Dummy(a="abc", b=True, c=47)
    dummy.save()

    for j in range(5):
        requests.get("https://example.com")
        Dummy.objects.create(
            a="".join(random.choices(string.ascii_letters, k=15)),
            b=random.choice([False, True]),
            c=random.randint(1, 5) or None,
        )

    for i in range(5):
        dummies = Dummy.objects.filter(c__gt=i)

    return another(request)

# models.py

class Dummy(models.Model):
    a = models.CharField(max_length=30)
    b = models.BooleanField(max_length=30)
    c = models.IntegerField(null=True)

# settings.py

sentry_sdk.init(
    # no dsn
    traces_sample_rate=1.0,
)

Collecting data

The cprofile middleware works by appending the prof query param to the request to the endpoint one wants to profile. If you want the data as file, you also append download.

Made a small script:

#!/bin/bash
for i in {1..1000}
do
   curl 'http://127.0.0.1:8000/do/?prof&download' --output prof/do_$i.prof
done

and ended up letting the profiler collect about ~600 runs, which I then fed to

python -m pstats prof/*

Sorted by tottime, filtered for sentry, this yields:

         68083083 function calls (68057016 primitive calls) in 1274.763 seconds

   Ordered by: internal time
   List reduced from 1118 to 47 due to restriction <'sentry'>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2938    0.321    0.000    4.009    0.001 /Users/ivana/dev/sentry-python/sentry_sdk/integrations/stdlib.py:69(putrequest)
     7638    0.219    0.000    1.044    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:210(start_child)
     7637    0.197    0.000    0.448    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:424(finish)
     7638    0.159    0.000    0.732    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:117(__init__)
     7637    0.150    0.000    0.226    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/hub.py:390(add_breadcrumb)
     7638    0.144    0.000    1.229    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/hub.py:431(start_span)
     2938    0.140    0.000    0.272    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/utils.py:1356(sanitize_url)
     2938    0.119    0.000    0.375    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing_utils.py:354(serialize)
     2938    0.105    0.000    0.446    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/utils.py:1403(parse_url)
     8224    0.087    0.000    0.728    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing_utils.py:101(record_sql_queries)
     5876    0.083    0.000  288.788    0.049 /Users/ivana/dev/sentry-python/sentry_sdk/integrations/stdlib.py:119(getresponse)
     4112    0.073    0.000    4.692    0.001 /Users/ivana/dev/sentry-python/sentry_sdk/integrations/django/__init__.py:586(execute)
     7637    0.067    0.000    0.220    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing_utils.py:144(maybe_create_breadcrumbs_from_span)
     7638    0.065    0.000    0.102    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/hub.py:277(get_integration)
     2937    0.055    0.000    0.086    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:384(set_http_status)
    14102    0.050    0.000    0.060    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/hub.py:150(current)
     8814    0.050    0.000    0.500    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:311(iter_headers)
    15275    0.044    0.000    0.060    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/utils.py:1532(nanosecond_time)
    54054    0.043    0.000    0.043    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/hub.py:301(client)
     8814    0.040    0.000    0.560    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/hub.py:702(iter_trace_propagation_headers)
     7638    0.039    0.000    0.045    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:74(add)
     5876    0.034    0.000    0.056    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/debug.py:12(filter)
     2938    0.031    0.000    0.125    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing_utils.py:369(should_propagate_trace)
     2938    0.029    0.000    0.093    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/utils.py:1462(match_regex_list)
      588    0.027    0.000    0.031    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing_utils.py:293(populate_from_transaction)
    22326    0.026    0.000    0.026    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:376(set_data)
     2938    0.025    0.000    0.056    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:674(get_baggage)
     4700    0.021    0.000    0.098    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:190(__exit__)
     7638    0.021    0.000    0.025    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:103(__new__)
    12338    0.018    0.000    0.018    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/scope.py:367(span)
    28790    0.018    0.000    0.018    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:201(containing_transaction)
     4700    0.017    0.000    0.028    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:180(__enter__)
     2938    0.016    0.000    0.020    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing_utils.py:83(has_tracing_enabled)
     2937    0.016    0.000    0.016    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:372(set_tag)
     2938    0.015    0.000    0.015    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:351(to_traceparent)
    19390    0.015    0.000    0.015    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/utils.py:187(capture_internal_exceptions)
     4112    0.015    0.000    0.023    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing_utils.py:193(_format_sql)
     9400    0.013    0.000    0.017    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/scope.py:373(span)
     4700    0.010    0.000    0.013    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/integrations/django/__init__.py:631(_set_db_system_on_span)
     2937    0.009    0.000    0.009    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing.py:380(set_status)
    19390    0.007    0.000    0.007    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/utils.py:176(__exit__)
     2938    0.007    0.000    0.008    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/_compat.py:48(<lambda>)
     8814    0.007    0.000    0.007    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/hub.py:307(scope)
     4112    0.006    0.000    0.006    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/utils.py:923(to_string)
    19390    0.005    0.000    0.005    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/utils.py:172(__enter__)
      588    0.004    0.000    0.181    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/integrations/django/__init__.py:610(connect)
      588    0.001    0.000    0.001    0.000 /Users/ivana/dev/sentry-python/sentry_sdk/tracing_utils.py:219(__init__)

sentrivana commented 1 year ago

And again request-response, this time looking at memory. Modified the view slightly:

# views.py

def do(request):
    with measure():
        for _ in range(1000):
            dummy = Dummy(a="abc", b=True, c=47)
            dummy.save()

            for j in range(5):
                requests.get("http://127.0.0.1:8000/ok")
                Dummy.objects.create(
                    a="".join(random.choices(string.ascii_letters, k=15)),
                    b=random.choice([False, True]),
                    c=random.randint(1, 5) or None,
                )

            for i in range(5):
                dummies = Dummy.objects.filter(c__gt=i)

        res = [f"{d.a} {d.b} {d.c}" for d in Dummy.objects.all()]
        return HttpResponse(res)

The measure context manager comes from Neel's suggestion here, with the output slightly tweaked for readability.

Result:

Top 25 lines
getsentry/team-webplatform-meta#1: sentry_sdk/integrations/threading.py:63: 499.5 KiB
    @wraps(old_run_func)
getsentry/team-webplatform-meta#2: sentry_sdk/hub.py:472: 168.5 KiB
    return span.start_child(**kwargs)
getsentry/team-webplatform-meta#3: sentry_sdk/tracing.py:115: 162.3 KiB
    return object.__new__(cls)
getsentry/team-webplatform-meta#4: sentry_sdk/tracing.py:378: 138.1 KiB
    self._data[key] = value
getsentry/team-webplatform-meta#5: sentry_sdk/integrations/django/signals_handlers.py:66: 116.7 KiB
    return receiver(*args, **kwargs)
getsentry/team-webplatform-meta#6: sentry_sdk/tracing.py:143: 65.7 KiB
    self._data = {}  # type: Dict[str, Any]
getsentry/team-webplatform-meta#7: sentry_sdk/tracing.py:134: 62.8 KiB
    self.span_id = span_id or uuid.uuid4().hex[16:]
getsentry/team-webplatform-meta#8: sentry_sdk/tracing.py:142: 61.8 KiB
    self._tags = {}  # type: Dict[str, str]
getsentry/team-webplatform-meta#9: sentry_sdk/integrations/stdlib.py:94: 46.8 KiB
    description="%s %s"
getsentry/team-webplatform-meta#10: sentry_sdk/tracing.py:439: 38.6 KiB
    self.timestamp = self.start_timestamp + timedelta(
getsentry/team-webplatform-meta#11: sentry_sdk/tracing.py:145: 38.6 KiB
    self.start_timestamp = start_timestamp or datetime.utcnow()
getsentry/team-webplatform-meta#12: sentry_sdk/tracing.py:387: 31.4 KiB
    "http.status_code", str(http_status)
getsentry/team-webplatform-meta#13: sentry_sdk/utils.py:1534: 30.9 KiB
    return time.perf_counter_ns()
getsentry/team-webplatform-meta#14: sentry_sdk/hub.py:406: 11.8 KiB
    crumb.update(kwargs)
getsentry/team-webplatform-meta#15: sentry_sdk/tracing.py:79: 8.6 KiB
    self.spans.append(span)
getsentry/team-webplatform-meta#16: sentry_sdk/integrations/stdlib.py:104: 6.3 KiB
    rv = real_putrequest(self, method, url, *args, **kwargs)
getsentry/team-webplatform-meta#17: sentry_sdk/hub.py:405: 6.2 KiB
    crumb = dict(crumb or ())  # type: Breadcrumb
getsentry/team-webplatform-meta#18: sentry_sdk/hub.py:413: 3.9 KiB
    crumb["timestamp"] = datetime.utcnow()
getsentry/team-webplatform-meta#19: sentry_sdk/integrations/stdlib.py:126: 3.3 KiB
    rv = real_getresponse(self, *args, **kwargs)
getsentry/team-webplatform-meta#20: sentry_sdk/tracing.py:526: 2.8 KiB
    Span.__init__(self, **kwargs)
getsentry/team-webplatform-meta#21: sentry_sdk/hub.py:474: 2.8 KiB
    return Span(**kwargs)
getsentry/sentry-python#1888: sentry_sdk/tracing_utils.py:127: 2.4 KiB
    data = {}
getsentry/sentry-javascript#7065: sentry_sdk/integrations/django/signals_handlers.py:57: 1.6 KiB
    @wraps(receiver)
getsentry/sentry-php#1457: sentry_sdk/hub.py:423: 1.0 KiB
    scope._breadcrumbs.append(new_crumb)
getsentry/sentry-ruby#1996: sentry_sdk/_functools.py:103: 0.2 KiB
    getattr(wrapper, attr).update(getattr(wrapped, attr, {}))

shanamatthews commented 1 year ago

@sl0thentr0py - qq, is this something that we should document, like on our performance overhead page?

sl0thentr0py commented 1 year ago

I will add a line or two once we make this default for everyone.