github-vet / bots

Bots for running analysis on GitHub's public Go repositories and crowdsourcing their classification.
MIT License
1 stars 1 forks source link

Use conditional API #130

Open kalexmills opened 3 years ago

kalexmills commented 3 years ago

GitHub provides a conditional API that can be used for improving performance in TrackBot. A conditional API request only counts against the limit in case a response is returned. This can be used to ensure we don't check issues whose reactions haven't been interacted with.

API Usage Analysis from Steve Guntrip @ GitHub Developer Support I've had a look at the last 30 days of API activity from @githubvet - covering 5,144,232 requests. I can see a few places where it might be possible to improve how you're using our API and to get more out of your rate limit. Here are the top paths you're hitting. These are internal paths but give us a good overview of which API endpoints you're accessing the most: ``` /repositories/:repository_id 1,599,196 31.113% /repositories/:repository_id/branches/* 1,590,978 30.954% /repositories/:repository_id/tarball/?*? 1,585,395 30.845% /repositories/:repository_id/issues 174,844 3.402% /repositories/:repository_id/issues/:issue_number/reactions 173,550 3.376% /repositories/:repository_id/issues/:issue_number 11,156 0.217% /repositories/:repository_id/issues/:issue_number/labels 3,536 0.069% /repositories/:repository_id/issues/:issue_number/labels/* 1,228 0.024% /repositories/:repository_id/issues/:issue_number/comments 1 0% ``` I can see that your requests to the top three paths span a large collection of repositories but if I spot-check those individual requests, I can lots of repetition. For example, for the repository 0xef53/go-tuntap, I can see that you've hit repos/0xef53/go-tuntap/branches/master 16 times and repos/0xef53/go-tuntap 13 times. Looking at the repository, it appears to have last been updated in March, so it's likely those API responses were the same each time. I'd suggest making use of conditional responses. Many API responses, including these, contain an ETag which you can store and provide with subsequent requests. If there's been no change, we return a HTTP 304 Not Modified and the request does not count against your rate limit. There's more information, and examples, to be found in our documentation: https://docs.github.com/en/free-pro-team@latest/rest/overview/resources-in-the-rest-api#conditional-requests Most of the /issues paths are related to your github-vet/rangeloop-pointer-findings repository. 40% of those requests are listing the repository's issues (/repos/github-vet/rangeloop-pointer-findings/issues) which would also be an ideal use for conditional requests. The remaining 60% of requests to the repository's issues are checking reactions on individual issues. As we discussed, polling here is your best solution as we don't currently offer a webhook event for reactions, but you could still make use of conditional requests here. In the 30 days I checked, each issue was being checked over 2000 times which is consuming a good portion of your rate limit. One final thing I noticed, which might make more sense to you, was a lot of duplicated requests. For example, at 04/01/2021 15:04:44.425 I see a request to /repos/github-vet/rangeloop-pointer-findings/issues/3892/reactions and another immediately after at 04/01/2021 15:04:44.684. Can you check your code to see where these duplicated requests are coming from? Fixing this should save you quite a few requests. I hope this information has been useful! Can you try and implement some of these suggestions, such as making conditional requests, and let me know if you're still having trouble? Best regards, Steve