dotnet / dnceng

.NET Engineering Services
MIT License
26 stars 22 forks source link

Staging - [Alerting] Build Analysis: Exceptions and Errors Alert #4957

Closed dotnet-eng-status-staging[bot] closed 1 month ago

dotnet-eng-status-staging[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Go to rule

@dotnet/dnceng, @dotnet/prodconsvcs, please investigate

Automation information below, do not change Grafana-Automated-Alert-Id-6fe0b7b34a004f0bad0064a42f9b9135
dotnet-eng-status-staging[bot] commented 1 month ago

:green_heart: Metric state changed to ok

Description and instructions for this alert

Go to rule

dotnet-eng-status-staging[bot] commented 1 month ago

:broken_heart: Metric state changed to alerting

Description and instructions for this alert

Go to rule

meghnave commented 1 month ago

These were all the following error - "An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full. (api.github.com:443)" - which seems pretty consistent in the previous occurrences of this alert as well. And the service has been running fine since with bouts of this. Looking at old (2023) occurrences, this could be related to lack of storage in helix staging? So maybe there's not much to do here.

garath commented 1 month ago

It wouldn't be disk storage, but possibly socket exhaustion maybe?