IgnoredAmbience / yahoo-group-archiver

Scrapes and archives a Yahoo groups email archives, photo galleries and file contents using the non-public API
MIT License
93 stars 46 forks source link

Stops on 403 #3

Closed flecom closed 4 years ago

flecom commented 5 years ago

If the script hits a message with an attachment that returns a 403 the entire script dumps out

ex:

* Fetching raw message #19950 of 20852
* Fetching raw message #19949 of 20852
* Fetching raw message #19948 of 20852
** Fetching attachment 'GE 19D430272G1 Load Test 23 Mar 2008.pdf'
Traceback (most recent call last):
  File "./yahoo.py", line 192, in <module>
    archive_email(yga, reattach=(not args.no_reattach), save=(not args.no_save))
  File "./yahoo.py", line 49, in archive_email
    atts[attach['filename']] = yga.get_file(attach['link'])
  File "/home/frank/yahoo-group-archiver/yahoogroupsapi.py", line 49, in get_file
    r.raise_for_status()
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://xa.yimg.com/kq/groups/_ojE2dTpetZkyjs-/uOZzGEDleNXA_vtz_o4-/name/GE+19D430272G1+Load+Test+23+Mar+2008.pdf
user@box:/home/user/yahoo-group-archiver#
`
IgnoredAmbience commented 4 years ago

I think attachment download handling should now be robust enough in the current version of the code. Please reopen if this error happens again.