envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25k stars 4.81k forks source link

Add slack notifications for failed github workflows #36326

Open phlax opened 1 month ago

phlax commented 1 month ago

Currently we monitor failed azp pipelines on slack.

We are about to shift our remaining ci -> github, at which point we will stop receiving these notices

We should add a slack notifier from github

phlax commented 1 month ago

cc @alyssawilk trying to figure out a good pattern for this

im thinking perhaps we run a scheduled job hourly and that queries the github api for failed jobs that shouldnt have - a bit like pr_notifier but more frequent, and then also perhaps a weekly report

alyssawilk commented 1 month ago

What do the failures per day look like now? can't we keep just posting to envoy-ci?

phlax commented 1 month ago

so - this is for posting to envoy ci - but obviously the azp bot we did use no longer works

the difference here (in github) is that the azp bot worked per-run - so posted at the end of the run that the run had failed

in github that is hard/unreliable/inefficient to replicate as it would require workflow that waits on multiple other workflows

so rather than posting at the end of every workflow - eg Envoy/Prechecks, mobile/ios-tests etc which could easily swamp the channel, im suggesting we do it hourly with a report of anything that failed in that hour

alyssawilk commented 1 month ago

gotcha. yeah if we can't do per-run, hourly seems Ok. I had assumed part of the workflow script could include a slack update so we wouldn't have to increase workflow complexity

phlax commented 1 month ago

main@e486663: failed -> [schedule@2024-10-07T06:38:23]: Envoy/Prechecks (failure) -> [schedule@2024-10-07T06:38:23]: Envoy/Checks (failure) -> [schedule@2024-10-07T06:38:23]: Envoy/Checks (failure) -> [schedule@2024-10-06T06:37:00]: Envoy/Prechecks (failure) -> [schedule@2024-10-06T06:37:00]: Envoy/Checks (failure) -> [schedule@2024-10-06T06:37:00]: Envoy/Checks (failure) -> [push@2024-10-05T23:27:40]: Envoy/Prechecks (failure) -> [push@2024-10-05T23:27:40]: Envoy/Checks (failure) main@33679e4: failed -> [push@2024-10-05T13:49:28]: Envoy/Prechecks (failure) -> [push@2024-10-05T13:49:28]: Envoy/Checks (failure) main@16cd72e: failed -> [push@2024-10-05T13:43:21]: Envoy/Prechecks (failure) -> [push@2024-10-05T13:43:21]: Envoy/Checks (failure) main@9244dc9: failed -> [schedule@2024-10-05T06:37:05]: Envoy/Checks (failure) -> [push@2024-10-04T23:59:47]: Mobile/iOS tests (failure) main@1173629: failed -> [push@2024-10-04T22:53:24]: Envoy/Prechecks (failure) main@42068a5: failed -> [push@2024-10-04T18:35:01]: Envoy/Prechecks (failure) -> [push@2024-10-04T18:35:01]: Envoy/Checks (failure)

github-actions[bot] commented 6 days ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.