aws / amazon-managed-grafana-roadmap

Amazon Managed Grafana Roadmap
Other
57 stars 4 forks source link

Notification deduplication for Unified Alerting #47

Open justinbwood opened 1 year ago

justinbwood commented 1 year ago

Per the AWS Managed Grafana docs on migrating classic alerts to Grafana alerting, multiple notifications are sent when using Grafana-managed alerts.

I would like to see Grafana's high availability alerting enabled so that notifications are properly deduplicated, as it's a bit frustrating to receive Slack notifications in triplicate when using Unified Alerting.

Thanks!

atze234 commented 1 year ago

I also like to see this. Really annoying with these three messages per Alert... I filed an issue over at grafana, but it seems like theres something wrong with amazon managed grafana config.

https://github.com/grafana/grafana/issues/68652

bradlet commented 1 year ago

I've also been running into this issue. Opened a support ticket w/ AWS and the result was basically reflecting the doc that was linked in this comment. It seems like really bad UX to spam out alerts like this... I'd be interested to hear what workarounds others have used; I'm in the process of migrating over to managing the alerts using an external alert manager, Prometheus AlertManager, instead. Would be nice to be able to provision the alert rules in Grafana though!

atze234 commented 1 year ago

As a workaround im using a Dynamodb and a Message hash in my Lambda that parses SNS. Like here:

https://gist.github.com/atze234/60dbef2991e08aba93b875c73578cf41

Also i set this in delivery_policy so that there is enough time to write to the db.

    "defaultThrottlePolicy": {
      "maxReceivesPerSecond": 1
    },
RphCos commented 11 months ago

This really is needed, since the "Classic" alerting is supposedly going away soon. It makes using Slack or Pagerduty impossible when monitoring large workloads, especially since classic alerts do not allow for template variables.

brc commented 11 months ago

+1

chr2che commented 11 months ago

is there any ETA for this please?

andrzej-mega commented 10 months ago

Spoke to AWS team about this today. They gave an "estimate" of Q1 2024 with possibility it might be as late as Q3 2024. According to them it's not a high priority issue for them and there are other issues they need to work on before that happens.

My biggest issue with it is that with Grafana managed service - alerting is advertised as a service feature.

I guess paying customers don't get a working feature until AWS deemed it worth fixing it...

kevdonde commented 7 months ago

We are also experiencing this issue. This is a primary feature of the service, and it is extremely disappointing that Amazon doesn't prioritize primary features of its products. We have waited for 1.5 years for Amazon to make 9.4 available in AMG so that we could use the alerting that is part of 9.4. Alerting is the only feature of 9.4 that we needed. It was/is the biggest reason to upgrade to 9.4. Now, we might further delay upgrading until as late Q3 2024 making it more than 2.5 years.

The purpose of the above rant is to add my vote to the priority of this issue.

webertrlz commented 4 months ago

+1

michael-ortiz commented 4 months ago

@VermaPriyanka do we have any updates on this and when should we expect a fix? This is really important to us!

amorphic commented 3 months ago

FYI @VermaPriyanka this is a showstopper for us. We considered various solutions for providing an observability service to our engineering teams and settled on Managed Grafana expecting it to Just Work. Now after a significant investment of resources to get set up and put processes in place, we've hit this bug which renders the service unfit for use. Alerting is core functionality and we cannot expect other teams to accept all of their alerts appearing 3x in Slack!

We would really appreciate a fix for this ASAP or at the very least an ETA on a fix and a standard workaround until the fix arrives.

sukoneck commented 3 months ago

workaround while we're waiting https://github.com/flashbots/prometheus-sns-lambda-slack

VermaPriyanka commented 3 months ago

Thank you all for the patience and for sharing workarounds. We understand that this is an important issue to solve and are working towards the same.

avpjanm commented 3 months ago

+1

magnowest commented 1 month ago

AWS released Grafana 10.4 yesterday, and it's still an issue.

Strangely, this was their response to the alerting in HA issue.

https://docs.aws.amazon.com/grafana/latest/userguide/v10-alerting-explore-high-availability.html

image
lorelei-rupp-imprivata commented 1 month ago

AWS released Grafana 10.4 yesterday, and it's still an issue.

Strangely, this was their response to the alerting in HA issue.

https://docs.aws.amazon.com/grafana/latest/userguide/v10-alerting-explore-high-availability.html

image

Yeah this is the WORST bug, I am not even sure how they can release with this issue, its been a year now, we are still stuck on the old legacy alerts because of this. That documentation almost suggests they won't fix this and its working as they designed it

VermaPriyanka commented 1 month ago

Thank you for voicing this concern. We are working towards a fix for the duplicate notifications issue in version 10. The description here explains the current workings of Grafana alerting, which implies rules are evaluated per HA instance. We are working towards solving this in 2 steps - focusing on solving the duplicate notifications first and to eliminate duplicate evaluations in the long term. We understand this has been a long wait, and are working towards releasing a fix soon.

ff-pjha commented 1 month ago

Facing the same issue. Do you have any workarounds for slack?

Diondk commented 3 weeks ago

Thank you for voicing this concern. We are working towards a fix for the duplicate notifications issue in version 10. The description here explains the current workings of Grafana alerting, which implies rules are evaluated per HA instance. We are working towards solving this in 2 steps - focusing on solving the duplicate notifications first and to eliminate duplicate evaluations in the long term. We understand this has been a long wait, and are working towards releasing a fix soon.

How fast can we get an fix for this, we are currently setting up alerting and its a real pain to receive all alerts 3x...

ursuciprian commented 1 week ago

Thank you for voicing this concern. We are working towards a fix for the duplicate notifications issue in version 10. The description here explains the current workings of Grafana alerting, which implies rules are evaluated per HA instance. We are working towards solving this in 2 steps - focusing on solving the duplicate notifications first and to eliminate duplicate evaluations in the long term. We understand this has been a long wait, and are working towards releasing a fix soon.

any updates on this nasty ,,feature"?