dotnet / dnceng

.NET Engineering Services
MIT License
24 stars 18 forks source link

Production - [Alerting] Autoscale: Minutes to scale-up from zero machine alert #3227

Closed dotnet-eng-status[bot] closed 2 weeks ago

dotnet-eng-status[bot] commented 2 months ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

@dotnet/dnceng, @dotnet/prodconsvcs, please investigate

Automation information below, do not change Grafana-Automated-Alert-Id-54aa0d7e647e46ff9e880bf6ae532b99

Release Note Category

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule

dotnet-eng-status[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 45 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Metric Graph

Go to rule