dotnet / dnceng

.NET Engineering Services
MIT License
25 stars 18 forks source link

Production - [Alerting] Android emulator failure rate alert #1840

Closed dotnet-eng-status[bot] closed 8 months ago

dotnet-eng-status[bot] commented 8 months ago

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Please note that this alert will fire every 12 hours as the list of machines can change while the alert is alive. So please keep an eye on the list of machines in the comment.

Go to rule

@dotnet/dnceng, please investigate

Automation information below, do not change Grafana-Automated-Alert-Id-e38f14fe3367451d8de43da6e2453fdd
dkurepa commented 8 months ago

since it's only one machine, there's no need to take any action. The VM will reboot, and should be fine after that

dkurepa commented 8 months ago

It looks like most of the recent failures happened on the runtime main ci pipeline, it's possible they introduced a workitem that's causing the issue (https://dataexplorer.azure.com/clusters/engsrvprod.westus/databases/engineeringdata?query=H4sIAAAAAAAAA21Ry2oDMQy85yvEnrKQFkrpcQMlyWEL6YOE9hi8tpo6yVrGlvOAfnyVOC9KfJNnNBqNxsjB6tj5hc0PBoTRGh1Pdx6hqqCYjamxKxzi2mp88xgUW3IFKGdgfOh8VW2mjraWB2Qwg1PbYmTVeuiDo023vHswMgO3jIIOBXVRlCJU4FWIOFtEct3LfylkH2iBmqED8s6CvUP5RWFZM7a1yfX7SvE3hVb0mKIYc%2FMrtXt%2FhMvMHlDb7l3eJuuMHrmnvYScN%2F5Uq4QZe2ax4DkXL9TU5pzjxZBkI2qBrCkEXZB1sLTOVNY5DPuuCOT%2BdcNHwoSHbDU5VlaSKlKTHKec7y38akpWmbAKjEZOoObUfTTl5cqnpfrwdJX0hFLQ2Ds1CkLBiMlmd9YyGPUfDHWHFzQCAAA%3D)