dotnet / dnceng

.NET Engineering Services
MIT License
24 stars 19 forks source link

Staging - [Alerting] Autoscale: Minutes to scale-up from zero machine alert #4476

Open dotnet-eng-status-staging[bot] opened 5 days ago

dotnet-eng-status-staging[bot] commented 5 days ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 60 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Go to rule

@dotnet/dnceng, @dotnet/prodconsvcs, please investigate

Automation information below, do not change Grafana-Automated-Alert-Id-54aa0d7e647e46ff9e880bf6ae532b99
dotnet-eng-status-staging[bot] commented 5 days ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 60 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Go to rule

dotnet-eng-status-staging[bot] commented 5 days ago

:broken_heart: Metric state changed to alerting

Scale up issue: A queue has been waiting for a machine to scale up for more than 60 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Go to rule

garath commented 4 days ago

I'm the one who sent this job to azurelinux.3.amd64.open, but I have no idea why this it is failing. I'll investigate and clear the queue soon.

dotnet-eng-status-staging[bot] commented 4 days ago

:green_heart: Metric state changed to ok

Scale up issue: A queue has been waiting for a machine to scale up for more than 60 minutes, there are no machines in this queue, which could cause a lot of work to get stuck.

Wiki link for investigation and mitigation steps here

Go to rule