Closed eggplants closed 2 years ago
So I suggest raising
TooManyRequestsError
when returned status code is 429.
def get_save_request_headers(self) -> None:
"""
Creates a session and tries 'retries' number of times to
retrieve the archive.
If successful in getting the response, sets the headers, status_code
and response_url attributes.
The archive is usually in the headers but it can also be the response URL
as the Wayback Machine redirects to the archive after a successful capture
of the webpage.
Wayback Machine's save API is known
to be very unreliable thus if it fails first check opening
the response URL yourself in the browser.
"""
session = requests.Session()
retries = Retry(
total=self.total_save_retries,
backoff_factor=self.backoff_factor,
status_forcelist=self.status_forcelist,
)
session.mount("https://", HTTPAdapter(max_retries=retries))
self.response = session.get(self.request_url, headers=self.request_headers)
# requests.response.headers is requests.structures.CaseInsensitiveDict
self.headers: CaseInsensitiveDict[str] = self.response.headers
self.status_code = self.response.status_code
self.response_url = self.response.url
session.close()
if self.status_code == 429:
raise TooManyRequestsError("The error message here")
What should be the error message? Should it(error message) be parsed every time or should it be a string literal?
Example: Save Page Now receives up to 15 URLs per minutes. Wait a moment and run again.
I think just checking the code should be enough.
@eggplants will you be working on this issue? Just asking so that we both don't end up creating two PRs.
I'll do it myself if you'd like.
I'll do it myself if you'd like.
Go ahead.
For future reference
See also https://github.com/akamhy/waybackpy/pull/142#issuecomment-1031850965
429 doesn't always imply that we have hit 15 archives per minute, at least on my IP. It could also imply that the URL we are trying to archive has reached it maximum limit.
When access rate is too frequent, Wayback Machine returns 429 as HTTP status code.
And returned HTML Body is:
https://gist.github.com/eggplants/414bab0230f14358642faf364bc1f7ec
So I suggest raising
TooManyRequestsError
when returned status code is 429.