dotnet / dnceng

.NET Engineering Services
MIT License
24 stars 18 forks source link

Production - [Alerting] On-Prem Machines Heartbeating By Queue alert #1486

Closed dotnet-eng-status[bot] closed 9 months ago

dotnet-eng-status[bot] commented 9 months ago

:broken_heart: Metric state changed to alerting

One or more queues of on-prem Helix machines had a heartbeat rate below 80%. This may indicate a deployment, the addition of new machines that are not yet active, or a systemic problem with the machines.

Metric Graph

Go to rule

@dotnet/dnceng, please investigate

Automation information below, do not change Grafana-Automated-Alert-Id-d2356d84cf3e43ea952d81de941eaa76
dotnet-eng-status[bot] commented 9 months ago

:green_heart: Metric state changed to ok

One or more queues of on-prem Helix machines had a heartbeat rate below 80%. This may indicate a deployment, the addition of new machines that are not yet active, or a systemic problem with the machines.

Metric Graph

Go to rule

oleksandr-didyk commented 9 months ago

Seems to have self-resolved, will continue monitoring

dotnet-eng-status[bot] commented 9 months ago

:broken_heart: Metric state changed to alerting

One or more queues of on-prem Helix machines had a heartbeat rate below 80%. This may indicate a deployment, the addition of new machines that are not yet active, or a systemic problem with the machines.

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 9 months ago

:green_heart: Metric state changed to ok

One or more queues of on-prem Helix machines had a heartbeat rate below 80%. This may indicate a deployment, the addition of new machines that are not yet active, or a systemic problem with the machines.

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 9 months ago

:broken_heart: Metric state changed to alerting

One or more queues of on-prem Helix machines had a heartbeat rate below 80%. This may indicate a deployment, the addition of new machines that are not yet active, or a systemic problem with the machines.

Metric Graph

Go to rule

oleksandr-didyk commented 9 months ago

Rebooted DNCENGMAC134 in osx.13.amd64.iphone Rebooted PERFOWL010, PERFOWL014 and PERFOWL015 in windows.10.amd64.20h2.owl.perf

Will continue monitoring

dotnet-eng-status[bot] commented 9 months ago

:green_heart: Metric state changed to ok

One or more queues of on-prem Helix machines had a heartbeat rate below 80%. This may indicate a deployment, the addition of new machines that are not yet active, or a systemic problem with the machines.

Metric Graph

Go to rule

ilyas1974 commented 9 months ago

Systems are online as expected. Closing alert.