dotnet / dnceng

.NET Engineering Services
MIT License
25 stars 19 forks source link

Build monitor is not creating issues for failed pipeline runs #3833

Closed riarenas closed 1 month ago

riarenas commented 2 months ago

Build monitor should've opened issues for the multiple failed builds in the helix-machines-daily definition according to the configuration in https://github.com/dotnet/dnceng/blob/58603e03089aae816e1a06065e104a1a6a81cb3f/src/DotNet.Status.Web/DotNet.Status.Web/.config/settings.json#L104-L107

It seems like no issues have been created in a while. We should investigate and see if the service is running correctly.

Release Note Category

riarenas commented 2 months ago

There is an alert for this functionality here: https://github.com/dotnet/dnceng/issues/3786. Unclear if the alert is related, but investigating that would be a good first step.

riarenas commented 2 months ago

investigation in #3786 has shown that the webhook was disabled after multiple failures.

https://github.com/dotnet/dnceng/issues/3786#issuecomment-2299001699

MilenaHristova commented 2 months ago

investigation in #3786 has shown that the webhook was disabled after multiple failures.

#3786 (comment)

I enabled but it but it got disabled again after some time because of many failures https://dotneteng-status.azurewebsites.net/api/azp/build-complete returns an error, is it possible that the url is not the right one?

riarenas commented 1 month ago

We've been opening issues through this automation. Taking a quick look at the status of the webhooks to see if we can close this.

riarenas commented 1 month ago

We have 4833 successful deliveries and 20 failed deliveries in the last 7 days. The failures I've examined resulted in 500 status codes from the status website. Will see if we have any logs on the service side for these failed requests.

riarenas commented 1 month ago

The failures I've examined were transient failures trying to access the GitHub APIs that lasted throughout 5 retries. I think we can close this as the functionality is working except when GitHub goes down