Open dotnet-eng-status[bot] opened 1 month ago
:green_heart: Metric state changed to ok
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:green_heart: Metric state changed to ok
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:green_heart: Metric state changed to ok
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:green_heart: Metric state changed to ok
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:green_heart: Metric state changed to ok
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
this is only happening for a couple of machines, so probably still follows up under the "ignore" (not catastrophic scenario), the errors are: device not found.
but it is happening kind of consistent, should we give a bigger follow up to this @premun ?
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
The only possibility is probably tuning the alert trigger conditions. Unfortunately, I don't know how to only make this alert when the number of machines is larger than some number. Maybe it is possible though..
:green_heart: Metric state changed to ok
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:green_heart: Metric state changed to ok
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:green_heart: Metric state changed to ok
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:green_heart: Metric state changed to ok
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
Hi @premun ,
I tried to create a different approach for this alert on this pull request: https://dev.azure.com/dnceng/internal/_git/dotnet-helix-service/pullrequest/40850
can you take a look and let me know if this could work, as it is quite different the visual information, the result is the expected but the main difference is that it only shows counts and only have one line with the failed machines
I tweaked to 30% for this example as right now all the machines are in a good state
but in there are only listed the machines > 80% failure rate and when the number is bigger than X it will trigger
also, which number of machines would be appropiate?
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:green_heart: Metric state changed to ok
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:green_heart: Metric state changed to ok
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
Hey @AlitzelMendez,
I think this is a good solution to the problems of this alert. Can you show me the Grafana query if you have edited some panel in staging/prod Grafana with this already? I think it would be easier to review if I can see the changes live.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:broken_heart: Metric state changed to alerting
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
:green_heart: Metric state changed to ok
Description and instructions for this alert
Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.
Hey @AlitzelMendez,
I think this is a good solution to the problems of this alert. Can you show me the Grafana query if you have edited some panel in staging/prod Grafana with this already? I think it would be easier to review if I can see the changes live.
@premun , I updated this panel with the changes: https://dotnet-eng-grafana.westus2.cloudapp.azure.com/d/mobileDevices/mobile-devices?tab=alert&viewPanel=19&orgId=1&editPanel=19 😄
Yeah, looks good!
:broken_heart: Metric state changed to alerting
Go to rule
@dotnet/dnceng, please investigate
Automation information below, do not change
Grafana-Automated-Alert-Id-e38f14fe3367451d8de43da6e2453fddRelease Note Category
Release Note Description
No release note needed