Azure / azure-sdk-tools

Tools repository leveraged by the Azure SDK team.
MIT License
114 stars 180 forks source link

Poll and report github rate limit API for check-enforcer and other webhooks #2680

Open benbp opened 2 years ago

benbp commented 2 years ago

Based on my own experience and the frequency of reports in chat, check enforcer delays are something we're starting to see more and more, for both "time to evaluate" and "failure to evaluate." We need to start measuring the reliability of our github automation to have a better idea of whether it's trending worse, how often it approaches failure, and if there is a seasonality to the behavior. Without clear data we have no way to improve the situation or determine whether the current performance is acceptable. Given the difficulty in tracking webhook delivery, polling the github rate limit API seems like a decent first attempt at measurement (and the rate limit API does not affect our rate limits).

API reference

richardpark-msft commented 2 years ago

I might be seeing the effects of this in my PRs. I'll often get a collision when I start the /azp run live test where it gets cancelled because (I think) the normal PR CI run gets scheduled after.

It's not a huge deal (I just restart the pipeline) but it usually means I have to wait and watch the PR for a bit, rather than just queueing things up asynchronously. This pattern can happen a few times, even for a single PR, as I address feedback from people and need to retest it, which might be atypical.