Azure / azure-monitor-baseline-alerts

Azure Monitor Baseline Alerts
MIT License
123 stars 181 forks source link

Resource Health fine tuning lacking #208

Open stevedistef opened 2 months ago

stevedistef commented 2 months ago

Check for previous/existing GitHub issues

Description

One of my customers implemented AMBA GA on Production Subscriptions, and they are seeing a lot of ResourceHealthUnhealthyAlert. We can see from the policy it can be turned off, but they would like to fine tune it (thresholding)? Is that possible? image

When checking the alert itself, we see only these options. Is there any way to fine tune this alert?

image

Brunoga-MS commented 2 months ago

Hello @stevedistef , thanks for your feedback. Based on the UI provided by the Service / Resource Health alerts, it seems there are no options to further fine tune these alerts. The only possible fine tuning I can see here is the reduction of the statuses listed in the Previous resource status for which I need to investigate internally. However, one question came to mind: are these alerts fired for the same resource or for resources that are just named the same but located into different subscriptions or resource groups?

Thanks, Bruno.

stevedistef commented 2 months ago

Buon Giorno @bruno and as thanks as always helping us adopt AMBA. You and the whole tiger team you have there under @paulgrimley really make it much easier than a DIY project :-)

OK so on this one, as we discussed, I can also see this environment has these resource health alerts and I filtered for one of the resources which shows up alot, sometimes with the same time stamp. I also filtered for only the last 30 days and then sorted by time:
image

When we checked the two which seemed to be redundant, they are actually different (2 different alerts). WHen we check the first one with that same timestamp, we see this: image

WHen we check the other which came at the same time, we see this different alert: image

so the question becomes do we really need to see both.... When examining the actual alert, we see this: (I clicked on Alert Rule in previous screen shot to get here): image

ANd then EDIT: we see that perhaps we have set up too many "previous conditions": image

I am going to ask the team using AMBA to go to Monito:ALerts:Alert Rules and edit the Resource Health alter for each of their subscriptions, removing the two previous conditions, and save it. so repeating this step 4x in this case: image

image

We will see if this is acceptable....

stevedistef commented 2 months ago

Customer trying over the weekend!

dbelso commented 1 month ago

Hi All, I tried the workaround and it seemed working for few days but now the issue got worse and flood our Monitoring page. image

Brunoga-MS commented 1 month ago

Hello @dbelso and @stevedistef , from your communication it looks like the fine tuning we applied was partially working. At this point we need to investigate further to understand why this is happening. We will keep you posted.

Thanks, Bruno.