getsentry / sentry

Developer-first error tracking and performance monitoring
https://sentry.io
Other
38.83k stars 4.16k forks source link

Close Alerts in Opsgenie when Metric alert resolves instead of Acknowledging it #62261

Open martin-dibella-peya opened 9 months ago

martin-dibella-peya commented 9 months ago

Environment

SaaS (https://sentry.io/)

Steps to Reproduce

The following Alert in Sentry https://pedidos-ya.sentry.io/alerts/rules/details/158369/?alert=4034&environment=production&notification_uuid=c57c83e1-1cf1-4fc7-804f-5e98bf65b45f&referrer=metric_alert_slack was not correctly propagated to OpsGenie.

Expected Result

Then it was resolved in Sentry but that was not propagated to OpsGenie

Actual Result

The alert was still alive in OpsGenie.

Product Area

Alerts

Link

No response

DSN

pedidos-ya

Version

No response

getsantry[bot] commented 9 months ago

Assigning to @getsentry/support for routing ⏲️

getsantry[bot] commented 9 months ago

Routing to @getsentry/product-owners-alerts for triage ⏲️

snigdhas commented 9 months ago

Hi @martin-dibella-peya, thanks for reporting the issue. I'm not seeing any errors on our end that would explain the sync failure. Was this a one-time thing you observed or a repeated behavior? If it's happened multiple times, any additional links/timestamps would be helpful!

martin-dibella-peya commented 9 months ago

It has happened before, we reported it on slack channel, with the same result on your side: no issue is observed on Sentry's side... Here are a few other cases:

  1. https://pedidos-ya.sentry.io/alerts/rules/details/158369/?alert=3997&referrer=metric_alert_slack&notification_uuid=0f3dd9e9-8757-408a-adc3-5efbe97b49f9
  2. https://pedidos-ya.sentry.io/alerts/rules/details/133892/?alert=3662&referrer=metric_alert_slack&notification_uuid=cd07d2d0-c83b-4fdd-bb61-bb7401acaed5
  3. https://pedidos-ya.sentry.io/alerts/rules/details/158369/?alert=3620&referrer=metric_alert_slack&notification_uuid=16a17168-9add-4f5a-a431-3673d07dea33

I guess that if it's not on Sentry's side, you should check with Opsgenie what happened with those alerts.

CC @jonurq-peya

malwilley commented 9 months ago

@martin-dibella-peya so it looks like there has been some work on improving the behavior for metric alerts when notifications are going to multiple locations https://github.com/getsentry/sentry/pull/50013, https://github.com/getsentry/sentry/issues/49252

There may be more work to do here. But if you would like a workaround, I believe the solution would be to either:

  1. Add an OpsGenie action for both critical and warning statuses, or
  2. Remove the warning threshold entirely

I believe the lack of a warning action for OpsGenie is making it so that the automatic resolve is getting skipped.

cathteng commented 6 months ago

We currently hit the Opsgenie acknowledge API. Are your alerts being acknowledged? Are you asking for us to hit the close alert API instead?