getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
39.2k stars 4.2k forks source link

‘Number of Errors’ Alert in Critical state for over a year #76580

Closed kpujjigit closed 1 month ago

kpujjigit commented 2 months ago

Environment

SaaS (https://sentry.io/)

Steps to Reproduce

It appears this Number of Errors Alert

Image

Image

Expected Result

State of Alert should change, and we in-fact see a change in state when we go to Edit the rule:

Image

Actual Result

Alert has been in a critical state since it was created a year ago; insight was lost for a client and spikes occurred without alerting to proper channels.

Product Area

Alerts

Link

No response

DSN

No response

Version

No response

getsantry[bot] commented 2 months ago

Auto-routing to @getsentry/product-owners-alerts for triage ⏲️

schew2381 commented 2 months ago

@kpujjigit Hey there, could you link to the org and alert links somewhere? I don't see them on the issue rn

leedongwei commented 2 months ago

@kpujjigit We are still missing the link to org/alert for us to debug this.

schew2381 commented 2 months ago

I was able to find the alert with a query. From looking at logs, the alert itself was setup in a way that it could never resolve.

The resolve threshold is set to resolve when # of errors is 100% lower than 1 week ago, but this means the number of errors at the current moment must essentially reach 0 in order resolve.

For example, we would reach this number very close to 100% if 1 week ago we had 9991 errors, and in the current moment we have 1

Image

The alert was resolved by editing it, but it still has the same resolve issue so once it triggers it will never resolve as well.