Azure / azure-monitor-baseline-alerts

Azure Monitor Baseline Alerts
MIT License
155 stars 224 forks source link

Unable to Exclude Virtual Machine Scale Sets Using the `MonitorDisable` tag #206

Closed simone-bennett closed 1 month ago

simone-bennett commented 5 months ago

Check for previous/existing GitHub issues

Description

We're getting flooded with Critical VM alerts when VMSS scale in and out. We have tried applying the MonitorDisable = True `tag to them and I've checked that it has filtered down and applied but the alerts are still firing.

We also tried running the clean-up script, deleting all the policy and making sure the MonitorDisable = True tag was applied, , then re-applying the policy. But the customer is still being flooded with alerts for VMSS events.

Tag that has been applied: image

Ideally that tag could be added to anything that is sending alerts and then it's just ignored until the tag is removed.

Alboroni commented 5 months ago

Have you deleted the actual alerts? The tag is to stop the alerts deploying to the scope so will not have an effect when already deployed.

We can also disable the alerts from a policy parameter. Please refer to the guide https://azure.github.io/azure-monitor-baseline-alerts/patterns/alz/Disabling-Policies

paulgrimley commented 5 months ago

@simone-bennett I would recommend following the guidance linked by @Alboroni to disable the alert in question as a start

simone-bennett commented 5 months ago

Thanks for getting back to me. We deleted everything, applied the Monitor Disable tag and re-deployed. I'll delete the alerts manually for sure as a workaround. However:

In addition to the VMSS question, It would be great to be able to ignore resources using a tag. For eg, a poc workload that we spin up and don't want to monitor. Can we apply the MonitorDisable tag to those resources and have them be ignored dynamically?

Alboroni commented 5 months ago

The tag is an all or nothing approach, . It will only apply to the scope the alert is scoped to, which is subscription for the VMs alerts and we do not offer further granularity on the tag. If you need to enable some alerts for VMs and not others we should use the AlertState parameter for each alert to disable or enable the necessary ones. We are looking into the scale sets alerting but you could use an alert processing rule to supress alerts from the Scale Sets for the time being

SteveBurkettNZ commented 4 months ago

Yeah, we get a bunch of ResourceHealthUnhealthyAlert alerts for VM Scale Sets, one for each node in the VM Scale Sets whenever it scales back and shuts down the extra node.

Would be nice to ignore ResourceHealthUnhealthyAlert on those extras somehow and only alert if there's below the defined minimum number of nodes in the set running (maybe VmAvailabilityMetric is good enough?)

Brunoga-MS commented 3 months ago

Hello @simone-bennett , thanks to the newly release feature of querying ARG from KUSTO, we will be able to exclude flexible VMSS node from virtual machine alerts. Still need to investigate the ResourceHealthUnhealthyAlert