track remained x-rate-limit in github api client and degrade gracefully when limits are about to be hit

IgorPerikov commented 5 years ago

contributions welcome for supporting rate limit check https://developer.github.com/v3/rate_limit/

JustPRV commented 4 years ago

According to GitHub API documentation, there are different types of API scopes with different limits, RestGithubApiClient hits only one which has a limit of 5000 requests per hour. I think it is possible to add interceptor to HttpClient which track X-RateLimit-Limit, X-RateLimit-Remaining and X-RateLimit-Reset and when there are only 1000 requests left the interceptor will add delay for calls, where delay is max(0, xRateLimitReset - now())/ xRateLimitRemaining. There is always an option to implement the same logic inside RestGithubApiClient. However, I think we should follow GitHub's advice 'If you exceed your rate limit using Basic Authentication or OAuth, you can likely fix the issue by caching API responses and using conditional requests.' and think about more flexible search. @IgorPerikov what do you think?

IgorPerikov commented 4 years ago

I had two different ideas on my mind:

check x-rate-limits before launching utility, with every http call decrease this number by 1, when it fall lower than 100 (50? or anything other), enable fallback mode and return empty lists on any subsequent call
figure out the response, which happens when limits are being hit and enter degradation mode as in the first variant

additionally, on both cases keep track of rate limits before launch and then:

if before was less than 5000, advice to launch again on X-RateLimit-Reset
if before was 5000, advice to query 1 language only (if multiple was requested), print human-readable X-RateLimit-Reset as well

interceptor will add delay for calls

Since limits reset every hour it might take very long time (up to hour, which seems unreasonably long time to wait)

you can likely fix the issue by caching API responses and using conditional requests

for now - we basically can't do that, because process re-creates on every launch

and think about more flexible search

have any ideas?

JustPRV commented 4 years ago

1) Check rate before the first call is a good idea. 2) Track the remaining number on the app side can lead to an invalid result. For example, if someone has other application which uses the same GitHub API or run 2 instances of this app. 3) If it requires more then 5000 calls for current implementation to get the result then it can not be done for less then hour. However, I agree that there should be a notification when the app hit the limit and Rest API calls are being slowed deliberately, so user has a choice to kill the process to adjust search criteria or wait for hours.

have any ideas?

Instead of making thousands of calls make a single to GraphQL API, but I haven't checked if it is possible to join all necessary criteria.

Please see related PR.

IgorPerikov commented 4 years ago

Instead of making thousands of calls make a single to GraphQL API

But it will have limits as well, so even if graphql implementation would be able to fetch issues for 120% amount of repositories, the problem is still here

Track the remaining number on the app side can lead to an invalid result. For example, if someone has other application which uses the same GitHub API or run 2 instances of this app.

yes, that's correct point, but I think it should be forced as an administrative requirement - don't use anything else, while executing mighty-watcher because limits can be screwed anyway

IgorPerikov / mighty-watcher

track remained x-rate-limit in github api client and degrade gracefully when limits are about to be hit #64