Open phlax opened 1 month ago
cc @alyssawilk trying to figure out a good pattern for this
im thinking perhaps we run a scheduled job hourly and that queries the github api for failed jobs that shouldnt have - a bit like pr_notifier but more frequent, and then also perhaps a weekly report
What do the failures per day look like now? can't we keep just posting to envoy-ci?
so - this is for posting to envoy ci - but obviously the azp bot we did use no longer works
the difference here (in github) is that the azp bot worked per-run - so posted at the end of the run that the run had failed
in github that is hard/unreliable/inefficient to replicate as it would require workflow that waits on multiple other workflows
so rather than posting at the end of every workflow - eg Envoy/Prechecks
, mobile/ios-tests
etc which could easily swamp the channel, im suggesting we do it hourly with a report of anything that failed in that hour
gotcha. yeah if we can't do per-run, hourly seems Ok. I had assumed part of the workflow script could include a slack update so we wouldn't have to increase workflow complexity
main@e486663: failed -> [schedule@2024-10-07T06:38:23]: Envoy/Prechecks (failure) -> [schedule@2024-10-07T06:38:23]: Envoy/Checks (failure) -> [schedule@2024-10-07T06:38:23]: Envoy/Checks (failure) -> [schedule@2024-10-06T06:37:00]: Envoy/Prechecks (failure) -> [schedule@2024-10-06T06:37:00]: Envoy/Checks (failure) -> [schedule@2024-10-06T06:37:00]: Envoy/Checks (failure) -> [push@2024-10-05T23:27:40]: Envoy/Prechecks (failure) -> [push@2024-10-05T23:27:40]: Envoy/Checks (failure) main@33679e4: failed -> [push@2024-10-05T13:49:28]: Envoy/Prechecks (failure) -> [push@2024-10-05T13:49:28]: Envoy/Checks (failure) main@16cd72e: failed -> [push@2024-10-05T13:43:21]: Envoy/Prechecks (failure) -> [push@2024-10-05T13:43:21]: Envoy/Checks (failure) main@9244dc9: failed -> [schedule@2024-10-05T06:37:05]: Envoy/Checks (failure) -> [push@2024-10-04T23:59:47]: Mobile/iOS tests (failure) main@1173629: failed -> [push@2024-10-04T22:53:24]: Envoy/Prechecks (failure) main@42068a5: failed -> [push@2024-10-04T18:35:01]: Envoy/Prechecks (failure) -> [push@2024-10-04T18:35:01]: Envoy/Checks (failure)
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
Currently we monitor failed azp pipelines on slack.
We are about to shift our remaining ci -> github, at which point we will stop receiving these notices
We should add a slack notifier from github