getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
39.05k stars 4.19k forks source link

Endpoint regression false positive alert #66906

Open FabioDavidF opened 7 months ago

FabioDavidF commented 7 months ago

Environment

SaaS (https://sentry.io/)

Steps to Reproduce

We got what we think is a false positive endpoint regression alert. We are not sure how to reproduce it, although we have a hyphothesis. I'm opening this issue as instructed by sentry's support via email.

The issue is that we received an alert for endpoint regression. After thorough analysis through our logs, we concluded that no regression happened. And even sentry's graph for transaction time p95 shows no regression throughout the weeks: image

Our hypothesis is: Maybe sentry calculates p95 daily or in a shorter time frame. Because our application has almost no load on the weekends, transaction times are always significantly lower, and we can clearly observe that through sentry's data vis. Would it be possible, for sentry, seeing an increase in transaction time on a monday, relative to the weekend, fire an alert because of it?

I'm not comfortable sharing identifiable information publicly, if anyone needs some links, issue ID, or any other information feel free to contact me.

Expected Result

No alert to be fired

Actual Result

image

Product Area

Performance

Link

No response

DSN

No response

Version

No response

getsantry[bot] commented 7 months ago

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] commented 7 months ago

Routing to @getsentry/product-owners-performance for triage ⏲️

Zylphrex commented 7 months ago

This looks like a good example of some seasonality affecting the regression detection. Most likely related to the lower traffic on weekends like you mentioned. Did this issue ever auto resolve after some time? We're looking at how we can reduce the number of false positives here but it may take some time to find a good solution.

FabioDavidF commented 7 months ago

Thanks for the response.

I'm not sure what you mean by auto resolve, but we haven't got the alert again. We got it a couple mondays in a row and the alert/event link is still accessible. But no more were fired.

Zylphrex commented 7 months ago

These regression issues will automatically go into a resolved state if it's detected that the regression has returned to normal. So it's possible that it happened to get into a cycle of regressed -> resolved each week for a while but the recent spikes are less severe and were not detected as regressions. This is just a guess though because without looking at your exact issue, I can't confirm it.