IgnoredAmbience / yahoo-group-archiver

Scrapes and archives a Yahoo groups email archives, photo galleries and file contents using the non-public API
MIT License
93 stars 46 forks source link

YahooGroupsAPI Unknown 401 error #124

Open thoni56 opened 4 years ago

thoni56 commented 4 years ago

I get

INFO archive_calendar Getting wssid. Expecting 401 or 403 response
ERROR YahooGroupsAPI Unknown 401 error for https://calendar.yahoo.com/ws/v3/users/@@groups_45d2c860-944c-4189-aca9-2360132a2a85/calendars/events/?format=json&dtstart=20000101dtend=20000201&wssid=Dummy, giving up on this download

Is the error the one that is expected? The INFO line seems to indicate so, but then the second ERROR should be captured. As a user I would like to only see ERRORs that are real errors affecting what I am getting in the download.

PS: Thanks for building this. Now that Yahoo is doing whatever they are doing, saving the history of many groups on the Internet is essential. DS.

ndevenish commented 4 years ago

This looks linked to a python2/3 unicode error:

2019-12-10 03:02:12.827 GMT INFO archive_calendar Getting wssid. Expecting 401 or 403 response.
2019-12-10 03:02:13.523 GMT ERROR YahooGroupsAPI Unknown 401 error for https://calendar.yahoo.com/ws/v3/users/@@groups_8b5d1f7e-a6fb-4702-909f-6069c4ef81b5/calendars/events/?format=json&dtstart=20000101dtend=20000201&wssid=Dummy, giving up on this download
2019-12-10 03:02:13.524 GMT INFO archive_calendar Trying to get events between 20010130 and 20031027
Traceback (most recent call last):
  File "./yahoo.py", line 1023, in <module>
    archive_calendar(yga)
  File "./yahoo.py", line 657, in archive_calendar
    calContent = json.loads(calContentRaw)
  File "/usr/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'bytes'
nsapa commented 4 years ago

Yes, the 401 error is expected; we get a required value from the error page (the wssid). The refactoring to make downloading more stable didn't expect this. That's why it log an error (that really happened) before continuing.