getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
38.55k stars 4.12k forks source link

Alert on "users experiencing errors" not triggering, despite being over warning and error thresholds #75568

Open arifken opened 1 month ago

arifken commented 1 month ago

Environment

SaaS (https://sentry.io/)

Steps to Reproduce

I haven't been able to reproduce this error since it happened, but I have the broken alert still in my account if that would be helpful.

The alert settings are: When: Users experiencing errors is above 10 in 5 minutes Then: Send a Slack notification to

The query defined is

count_unique(user)
(event.type:[error, default]) AND (level:fatal release.stage:adopted) over 5 minutes

Expected Result

I should be getting Slack messages whenever the warning or error thresholds are exceeded

Actual Result

i'm not getting any messages. And the alert remains in a "resolved" state even though the timechart shows the counts being over threshold

Product Area

Alerts

Link

No response

DSN

No response

Version

No response

getsantry[bot] commented 1 month ago

Auto-routing to @getsentry/product-owners-alerts for triage ⏲️

ceorourke commented 1 month ago

I think the issue here is the time period of 5 minutes paired with the release.stage tag. release.stage is a rolling tag that's evaluated every hour from the last six hours of data. It might be that having release.stage:adopted doesn’t work with a 5 minute interval because at the time of evaluation, the release that’s experiencing errors isn’t adopted yet, so I'd recommend changing that to 1 hour.

We can do a better job here, but with regards to it showing that the alert is resolved, that's how all alerts that haven't fired yet are shown.

arifken commented 1 month ago

Ah! ok. but as soon as that release is adopted, it should alert right? or is it that release.stage: adopted is always incompatible with a <1h interval regardless of how much traffic has shifted to that version?

getsantry[bot] commented 1 month ago

Routing to @getsentry/product-owners-releases for triage ⏲️

getsantry[bot] commented 1 month ago

Routing to @getsentry/product-owners-issues for triage ⏲️

getsantry[bot] commented 1 month ago

Routing to @getsentry/product-owners-releases for triage ⏲️

getsantry[bot] commented 1 month ago

Routing to @getsentry/product-owners-alerts for triage ⏲️

rachrwang commented 1 month ago

@arifken - I checked with the team and release.stage:adopted is currently not supported as a query term for metric alerts. We are making a series of changes that would support this in the future. Thank you for filing this ticket!

getsantry[bot] commented 1 month ago

Routing to @getsentry/product-owners-releases for triage ⏲️

getsantry[bot] commented 1 month ago

Routing to @getsentry/product-owners-alerts for triage ⏲️