getsentry / sentry-laravel

The official Laravel SDK for Sentry (sentry.io)
https://sentry.io
MIT License
1.26k stars 188 forks source link

Send events after response was send to user. #806

Closed xwillq closed 6 months ago

xwillq commented 11 months ago

Problem Statement

Currently we have performance degradation after enabling lazyLoadingViolationReporter on our staging environment.

Here is trace before enabling it: CleanShot 2023-11-22 at 19 13 56@2x

And after enabling it: CleanShot 2023-11-22 at 19 23 26@2x

All those "app.function" spans are calls to lazyLoadingViolationReporter wrapped in custom instrumentation.

Environment

sentry/sentry-laravel - 4.0.0 sentry/sentry - 4.0.1 Self hosted Sentry instance - 23.11.0

Sentry is deployed on separate server in the same data center, request are sent directly into nginx service, bypassing reverse proxy/load balancer. Communication delay between 2 servers is <1ms.

Solution Brainstorm

As far as I understand, sentry-laravel passes all events to Client using HubInterface which is currently bound in service container, and Client sends all events through HttpTransport. We could make something like DefferedHub or DeferredHttpTransport, which would store all exceptions somewhere and actually send them after response was sent to user. With this solution, sentry would use more memory for storing events and all data associated with them, but for some people this will be an acceptable tradeoff.

cleptric commented 11 months ago

We already use this approach for transactions, utilizing Laravel's terminable middleware. https://docs.sentry.io/platforms/php/guides/laravel/performance/#improve-response-time

For errors, we always want them to be sent immediately, as an OOM down the line could cause no events to be sent to Sentry, which is even worse.

We could look into bundling up lazyloading violations much more efficiently, though.

In a perfect world, the data would be attached to the transaction, and all processing would happen server-side, utilizing our issues platform (the thing that creates performance issues on Sentry).

xwillq commented 11 months ago

In that case what about configuring what would be sent immediately, and what would be deferred? Yeah, sending errors is important, but reporting lazy loading violations isn't worth tanking performance, so they could be sent after response is sent (if application is still alive by that time), if user chooses so through config. The same goes for some logs.

Also maybe some deduplication could be done on SDK side before sending? In my case it was a N+1 query that was triggering lazy loading. We could detect that we already sent an event with exact same stack trace and drop second one. There are some mentions of deduplication in code, but I guess it is done on Sentry side.

cleptric commented 11 months ago

Most SDKs ship with a dedupe feature, but all these SDKs use a background queue for sending events. I currently have yet to decide on a good immediate fix. The feature is opt-in, which adds some leeway, but I agree that this is generally not a great experience.

xwillq commented 11 months ago

In my case this doesn't require immediate fix. I already know that our legacy code has a lot of N+1 and lazy loading, so I'll probably just disable it. But getting some solution for this problem eventually would be nice.

stayallive commented 11 months ago

We could probably add some very naïve logic that dedupes these events in the same request at least for the same model / attribute / relation etc.

However my question is would this in your case result in just a single report or would that still result in many reports?

I think adding this logic to de-duplicate repeat reports inside a single request would already be an improvement but we could also think of a simple way to "queue" this kind of events since they are not likely to be super critical so if we miss one because of an OOM that is an acceptable risk.

cleptric commented 11 months ago

I kicked off an internal discussion to add this to the issue platform. An added benefit is that no error quota is consumed anymore, though performance must be enabled for this to work then.

stayallive commented 11 months ago

If it consumes no quota anymore we can also consider enabling this by default which would be cool especially if it's surfaced at the span level 👀

We can always keep the error version around for when wanted (and to prevent BC) but adding it to the query span would be a very nice addition!

xwillq commented 11 months ago

I think adding this logic to de-duplicate repeat reports inside a single request would already be an improvement

Yeah, I think this simple deduplication would be much better than sending every event. I don't want to know how many times this issue occurred in the same request, since it often depends on user input like amount of items per page, or some data from external source like database. I simply want to know how often this issue occurs and how many users it has affected.

However my question is would this in your case result in just a single report or would that still result in many reports?

I don't quite understand this question.

we could also think of a simple way to "queue" this kind of events since they are not likely to be super critical

Maybe this could be done with config value that would decide which events should be queued and which should be sent immediately? Events already have severity level, so we could do something like 'send_immediately' => 'error'.

stayallive commented 10 months ago

In #825 at the very least we are now going to be de-duplicating events which might help a bit.

@cleptric in #825 I could also add another param called sendAfterResponse or something that queues up the events to send after the response possibly which could solve this specific problem allowing the user to switch it on themselfs, WDYT?