getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
39.06k stars 4.19k forks source link

Sessions randomly dropping - cannot debug why? #71816

Open splundge opened 5 months ago

splundge commented 5 months ago

Environment

SaaS (https://sentry.io/)

What are you trying to accomplish?

Sentry seems to be randomly dropping sessions for users - but we cannot debug/find the cause? (update: screen shot below)

I can't just 'turn on debug mode' because there's no way for me to access the affected user's browser and observe what's happening.

is there some kind of what I can debug this? Is there some kind of event handler that i can hook onto if a session fails to upload? At least this way i could send some debug info back to our servers and inspect from there.

How are you getting stuck?

no way to debug the issue

Where in the product are you?

Issues

Link

No response

DSN

No response

Version

No response

getsantry[bot] commented 5 months ago

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] commented 5 months ago

Routing to @getsentry/product-owners-performance for triage ⏲️

AbhiPrasad commented 5 months ago

For example, over the last 24hours, we've had 4k transactions. 20 of those have been dropped.

Where are you seeing this? In the Sentry stats UI?

@splundge - I recommend reaching out to support @ sentry.io (remove spaces). They can help take a look at your account/projects and debug more in detail.

splundge commented 5 months ago
sessionsdropped

I'm seeing this in the sentry web console. We noticed this because a user experienced a bug a few days ago - we went to investigate - and couldn't find his session in sentry. But we checked our other analytics tracking services, and the user was DEFINITELY using the app at a certain time..... but his session was just missing.

update: i updated the image to actually show when this began. 12th of march for some reason. I've reviewed our code and deployments - no changes were made to sentry. Did something change on sentries side ?

AbhiPrasad commented 4 months ago

Did something change on sentries side ?

Nothing strikes out from looking at our commits (https://github.com/getsentry/sentry/commits/master/). I think messaging support is the right call here, they can help a lot more.

splundge commented 4 months ago

Did something change on sentries side ?

Nothing strikes out from looking at our commits (https://github.com/getsentry/sentry/commits/master/). I think messaging support is the right call here, they can help a lot more.

I did reach out to support and they told me to "turn on debug mode" but as you can imagine, turning on debug mode does not help at all if you can't reproduce the issue.

I figured reaching out via the community forum might get a wider response. I'll reach out to support again

billyvg commented 4 months ago

@splundge I took a look in an internal dashboard that has some additional stats (there's a ticket to expose these to the stats dashboard you are accessing) and I'm seeing ~10-15% of replay events being dropped client side. Can you call the Replay integration with the following configuration?

  replayRef = replayIntegration({
    _experiments: {
      captureExceptions: true,
    },

This will capture errors thrown in the replay integration (though it will use up your quota, but I'll be sure to get your account credited). Please let me know if you see some exceptions from this and I can follow-up. Thanks!

splundge commented 4 months ago

@splundge I took a look in an internal dashboard that has some additional stats (there's a ticket to expose these to the stats dashboard you are accessing) and I'm seeing ~10-15% of replay events being dropped client side. Can you call the Replay integration with the following configuration?

  replayRef = replayIntegration({
    _experiments: {
      captureExceptions: true,
    },

This will capture errors thrown in the replay integration (though it will use up your quota, but I'll be sure to get your account credited). Please let me know if you see some exceptions from this and I can follow-up. Thanks!

Hey, I've hooked up the debugging code, and deployed it to our development environment. I can confirm that in the last 12hrs since this post, there's been atleast 1 session dropped, and we can't explain why. Can you guys investigate and see anything on your side? The capture exceptions stuff should be showing up... somewhere!

billyvg commented 4 months ago

@splundge I see it showing up in the issues stream, it's not very helpful, but I did notice that you are using a very old SDK version (7.53.1), can you upgrade to the latest v7 release (7.117.0) as there have been many bug fixes since.

splundge commented 4 months ago

thanks @billyvg . I updated the sdk to 8.9.2 in our dev environment 3 days ago. Can you see any further issues popping up on your side ?

getsantry[bot] commented 4 months ago

Routing to @getsentry/product-owners-replays for triage ⏲️

bruno-garcia commented 4 months ago

Routing to @getsentry/product-owners-replays for triage ⏲️

Could you share another screenshot of the stats page? or a link to Sentry.io so we could take a look on our end

billyvg commented 4 months ago

@splundge Unfortunately we're unable to filter stats by environment. I am seeing some "Unable to send replay" errors, but there are replays associated with them. We can take a look at the stats again when you're ready to deploy the new SDK to prod -- ping me when you do.

splundge commented 4 months ago

@billyvg i deployed to prod roughly 21 hours ago. BUT i disabled the captureExceptions flag (to try and conserve quota). It looks like there's still around 16 dropped sessions in the last 24hrs. I might need to turn the flag back on and redeploy prod. Also i am having trouble finding ANY replay that is associated with a missing session. Can you please link me to one of them?

@bruno-garcia im not sure what you mean? here's another screen shot of our dropped transactions

dropped transactions 2
billyvg commented 4 months ago

@splundge would you mind e-mailing me directly (billy at sentry.io) so that I'm not exposing any data. I'm also in the Sentry discord (billy.work) if that works better for you.

billyvg commented 1 month ago

@splundge Are you still experiencing issues? Another thing to rule out is that we do not send replays if they are less than 5 seconds as they are unlikely to be useful (and we don't want these to use up quota). If you wanted to test this you could call the replay integration w/ minReplayDuration set to (Sentry.replayIntegration({minReplayDuration: 0})) and see if you are still getting dropped sessions.

The errors you were receiving (Unable to send replay), when you had captureExceptions turned on, indicate that there was a network issue when sending the replay, however these should be retried. A different error message (Unable to send replay - max retries exceeded) would mean that it completely failed to send.

It's also possible that these users have adblock. Are there any Sentry events (e.g. errors) associated with these missing sessions? If you have the IP address of these dropped sessions, and an approximate timestamp of when they visited, I could check our load balancer logs to see if their events made it to Sentry at all.

splundge commented 1 month ago

Hey - sorry we've been pretty busy and this issue has sunk to the bottom of our list. I need to reinvestigate from our side to see if its still an issue. I'll do some digging and get back in here next week. Hopefully it's not an issue (since upgrading). Thanks for bringing this back to my attention haha

billyvg commented 1 month ago

@splundge No worries, we'll be publishing 8.31.0 sometime next week too that will include https://github.com/getsentry/sentry-javascript/pull/13721 which adds an onError callback