chaoss / augur

Python library and web service for Open Source Software Health and Sustainability metrics & data collection. You can find our documentation and new contributor information easily here: https://oss-augur.readthedocs.io/en/main/ and learn more about Augur at our website https://augurlabs.io
https://oss-augur.readthedocs.io/en/main/
MIT License
585 stars 845 forks source link

Handle GitHub ratelimit responses based on Github API docs #2784

Closed ABrain7710 closed 4 months ago

ABrain7710 commented 5 months ago

Description

The github docs state this (docs link):

If you exceed your primary rate limit, you will receive a 403 or 429 response, and the x-ratelimit-remaining header will be 0. You should not retry your request until after the time specified by the x-ratelimit-reset header. If you exceed a secondary rate limit, you will receive a 403 or 429 response and an error message that indicates that you exceeded a secondary rate limit. If the retry-after response header is present, you should not retry your request until after that many seconds has elapsed. If the x-ratelimit-remaining header is 0, you should not retry your request until after the time, in UTC epoch seconds, specified by the x-ratelimit-reset header. Otherwise, wait for at least one minute before retrying. If your request continues to fail due to a secondary rate limit, wait for an exponentially increasing amount of time between retries, and throw an error after a specific number of retries.

How I accomplished this

So I added a check to see if the response code is 403 or 429. And if it is I check if the retry after header is present or the ratelimit reset header is 0. If the retry after header is present then I sleep for the designated time. If the ratelimit reset header is present then I sleep until it resets. Otherwise I sleep for 60 seconds.

The retry after and ratelimit reset header logic are things already implemented today, but we are interrogating the error string in the response body which is a bad idea because Github could change the response message at any time and break augur. So this change it meant to move away from interrogating the error string in the response body and instead use the response code and headers like the github docs recommend.

This change does not include the recommended exponential backoff because we do not have that today and I wanted to keep this change as small as possible. Also I did not remove the string interrogation because I want to use it as a fallback for other status codes that are not handled yet, as well as keep the change small. I added a log message in the existing string interrogation so we can find cases where it is still being used and address them.

I also changed repo not found handling to be based on the 404 status code rather than interrogating the error in the response body and looking for "Not Found"

Lastly I changed the yields that were in a loop to use yield from based on the linter's suggestion

Signed commits