IgnoredAmbience / yahoo-group-archiver

Scrapes and archives a Yahoo groups email archives, photo galleries and file contents using the non-public API
MIT License
93 stars 46 forks source link

500 error on API calls (inconsistently -- API server flakiness) #62

Closed IgnoredAmbience closed 4 years ago

IgnoredAmbience commented 4 years ago

Testcase: https://groups.yahoo.com/api/v1/groups/ColorComputer/messages/812/raw

Ensure handled gracefully.

IgnoredAmbience commented 4 years ago

Confirmed handled gracefully now.

IgnoredAmbience commented 4 years ago
2019-10-25 23:06:59.699 BST INFO archive_message_content Fetching  raw message id: 812 (of 5091)
2019-10-25 23:06:59.892 BST ERROR archive_message_content Raw grab failed for message 812
Traceback (most recent call last):
  File "./yahoo.py", line 92, in archive_message_content
    raw_json = yga.messages(id, 'raw')
  File "/home/thomas/yahoo-group-archiver-1/yahoogroupsapi.py", line 110, in get_json
    raise e
  File "/home/thomas/yahoo-group-archiver-1/yahoogroupsapi.py", line 103, in get_json
    r.raise_for_status()
  File "/home/thomas/yahoo-group-archiver-1/env3/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://groups.yahoo.com/api/v1/groups/ColorComputer/messages/812/raw
2019-10-25 23:06:59.892 BST INFO archive_message_content Fetching html message id: 812 (of 5091)
IgnoredAmbience commented 4 years ago

I'm seeing 500 errors that are not persistent, seems to be API server flakiness. We definitely need a (deferred) retry mechanism here. And ideally tooling to pull failures out of already captured archive logs.

IgnoredAmbience commented 4 years ago

For a transient error:

"ygError":{"hostname":"gapi5.grp.bf1.yahoo.com","httpStatus":500,"errorMessage":"Internal error: Error during message fetch","errorCode":1001,"sid":"SID:YHOO:groups.yahoo.com:ee5ae7585faace373012569deb353069:0"}}
IgnoredAmbience commented 4 years ago

Unfortunately the same error results for the raw message that always errors:

"ygError":{"hostname":"gapi1.grp.bf1.yahoo.com","httpStatus":500,"errorMessage":"Internal error: Error during message fetch","errorCode":1001,"sid":"SID:YHOO:groups.yahoo.com:4470210ff4f93c3f0d8ca3aa5cc5129a:0"}}
IgnoredAmbience commented 4 years ago

Additional retries added in 50eaa48

d235j commented 4 years ago

This needs to be handled for any API call — suggest moving the logic to the YahooGroupsAPI module.