Closed d235j closed 4 years ago
PR to fix welcomed :)
Found the problem.
get_calendars() relies on an error request (returning 401 or 403) to obtain the correct wssid parameter.
https://github.com/IgnoredAmbience/yahoo-group-archiver/commit/70cc682996e6206869e7192cb8590b557ff47746 changed get_file() to download_file() which doesn't handle the wssid properly — see https://github.com/IgnoredAmbience/yahoo-group-archiver/blob/d0644995977b808969e6e0c7b44a0fe780273bac/yahoo.py#L372 . I can PR soon.
Ah, I hadn't realised what was going on there when I merged that change. Just reintroducing the error-friendly version of the get_file should do.
@IgnoredAmbience what are your thoughts on making download_file return the content even when there is an error that doesn't go away on retry, so that it can be stored? (Or is that unnecessary as we're storing that in warc?)
The raised exception from requests should have the response and request objects available on it to query. It should be possible to do this from an exception handler when we're to expect a failure.
I'm having this happen with the Alps group (which is unfortunately private).