StrawberryIntegration slows down query performance dramatically

pcraciunoiu commented 1 month ago

How do you use Sentry?

Sentry Saas (sentry.io)

Version

2.17.0

Steps to Reproduce

Set up a Django project with Strawberry async.
Set up a query for an endpoint that returns many objects. I tested with 2,000 objects and only 2 fields for each - an ID and a string, e.g.

@strawberry_django.type(models.Item, fields=["id", "value"], pagination=True)
class Item:
   pass

@strawberry.type
class Query:
    items: list[types.Item] = strawberry_django.field()

query Items {
  items(pagination: { limit: 2000 }) {
    id
    value
  }
}

Expected Result

Not much performance difference with or without StrawberryIntegration

Actual Result

With StrawberryIntegration, this took 3-4x longer. In my case, about 800ms. Without it, I got the results back in 200-250ms.

sentrivana commented 1 month ago

Hey @pcraciunoiu, thanks for reporting.

I'm thinking this might be due to the SDK in general having a lot of room for improvement in the async department. Might be the transport blocking. (There are plans for a fully async transport/SDK, but this will take some time.)

Couple things to check:

If you don't configure the DSN (i.e., make the SDK not actually send anything out), do you still observe this? If not, my guess is that it's the transport.
Do you by any chance have profiling on and can see where we spend the extra time?

pcraciunoiu commented 1 month ago

Hey @pcraciunoiu, thanks for reporting.

I'm thinking this might be due to the SDK in general having a lot of room for improvement in the async department. Might be the transport blocking. (There are plans for a fully async transport/SDK, but this will take some time.)

Couple things to check:
* If you don't configure the DSN (i.e., make the SDK not actually send anything out), do you still observe this? If not, my guess is that it's the transport.

It doesn't make a difference if the dsn value is passed in or not. Only the StrawberryIntegration makes a difference, and I have to set auto_enabling_integrations=False to ensure it doesn't get added automatically.

* Do you by any chance have profiling on and can see where we spend the extra time?

I did try with pyinstrument, but nothing stood out. I had to trial and error for a few hours until I figured out it's sentry, and then which integration.

I'll attach it in case it helps. If you have recommendations for other profiling tools that might do better, I'd like to learn about them.

1.15s - middleware.py_54 - pyinstrument.zip

szokeasaurusrex commented 1 month ago

@pcraciunoiu Interesting, I also did not see anything in the profile you attached, although I am not really sure how pyinstrument works (sampling or deterministic).

I think @sentrivana was suggesting that you could try using Sentry's built-in profiler. You can enable the profiler by setting profiles_sample_rate to a value above zero when initializing the Sentry SDK. You also need to set traces_sample_rate to a value above zero. The traces_sample_rate controls the fraction of Sentry transactions which are sent to Sentry, and the profiles_sample_rate controls the fraction of transactions which are profiled. Here is an example of how to set these.

Then, you can view the generated profiles directly in Sentry linked to the corresponding transaction.

pcraciunoiu commented 1 month ago

@szokeasaurusrex for the profiling I sent, I do not have sentry set up yet as it's a new project.

I do have another project that was running the strawberry integration until late last week, when I found this issue and disabled it. I confirmed on that project too, that it was adding on 150ms or more to GraphQL requests as I tested locally. I have some profiling from last week before I disabled this integration, but it's a private repository and I'm not comfortable including links or details here. If you want to, I can share a sentry link privately via email or otherwise, and/or you can let me know what you need to look into this further.

sentrivana commented 1 month ago

@pcraciunoiu Happy to look at the Sentry link -- you can send it to ivana.kellyer (at) sentry.io.

I will try to repro this today. Also want to compare the performance with a sync Django/Strawberry app. (My money is still on this being a general problem with the SDK blocking.) Will report back.

sentrivana commented 1 month ago

So I'm seeing an increase too, and only with the async extension. I assume you're setting StrawberryIntegration(async_execution=True) explicitly? If so, can you try with async_execution=False? This makes the extension much faster for me even when using strawberry-django's AsyncGraphQLView.

Also, could you post your sentry_sdk.init? Are you using performance/tracing?

pcraciunoiu commented 1 month ago

@pcraciunoiu Happy to look at the Sentry link -- you can send it to ivana.kellyer (at) sentry.io.

Just sent you an email

So I'm seeing an increase too, and only with the async extension. I assume you're setting StrawberryIntegration(async_execution=True) explicitly? If so, can you try with async_execution=False? This makes the extension much faster for me even when using strawberry-django's AsyncGraphQLView.

I can confirm it's much faster with async_exection=False, but it's still 200ms slower. My test case is 2000 items with a related nested item through a data loader, let's pretend it's this:

query Articles {
   articles({ pagination: { limit: 2000 }) {
     id
     title
     status
     author {  # data loader here
        id
        firstName
     }
  }
}

This is what I'm seeing (roughly from running each 10x)

No integration - average 700ms
With integration but no async execution - average 900ms
With integration and async execution - average 1,500ms

I assumed that async execution is required to use in an async environment, but I guess that's not true? Still, there is a noticeable slowdown.

pcraciunoiu commented 1 month ago

Also, could you post your sentry_sdk.init? Are you using performance/tracing?

Here it is. I've tried both with/without sample rates for traces/profiles. Doesn't seem to matter, but that's probably because the DSN string is empty.

sentry_sdk.init(
    dsn=SENTRY_DSN,
    integrations=[
        DjangoIntegration(),
        CeleryIntegration(),
        LoggingIntegration(level=logging.INFO),
       #  StrawberryIntegration(async_execution=True),
    ],
    auto_enabling_integrations=False,
    release=GIT_SHA,
    environment=ENV,
    send_default_pii=True,
    # traces_sample_rate=0.5,
    # relative to traces sample rate
    # profiles_sample_rate=0.5,
)

szokeasaurusrex commented 1 month ago

Does setting the SENTRY_DSN to your actual DSN make a difference?

pcraciunoiu commented 1 month ago

Does setting the SENTRY_DSN to your actual DSN make a difference?

It does not seem to make a difference from my quick tests

sentrivana commented 3 weeks ago

The profile in Sentry looks thoroughly uneventful, but the pyinstrument profile you attached is interesting. Seems like there's a lot of time spent waiting on threading locks/conditions. Don't know if that's what's causing the slowness but we can verify that.

Would you be able to get us a pyinstrument profile like the one above for the query:

without Sentry
with Sentry and Strawberry integration enabled sync
with Sentry and Strawberry integration enabled async
optionally also with Sentry but Strawberry integration disabled (just with disabled_integrations=[StrawberryIntegration()] (see here) so that you still get all the other integrations that would be enabled in options 2 and 3)

We could then see what stands out.

Feel free to attach here or send them to me via email.

pcraciunoiu commented 4 days ago

@sentrivana I finally got around to this and emailed you those files. It is definitely slowest with the strawberry integration async.

sentrivana commented 2 days ago

Thanks @pcraciunoiu, got the files and will take a look in the next couple of days.

getsentry / sentry-python