IgnoredAmbience / yahoo-group-archiver

Scrapes and archives a Yahoo groups email archives, photo galleries and file contents using the non-public API
MIT License
93 stars 45 forks source link

Attachment Download Failing With 400 #8

Closed abney317 closed 5 years ago

abney317 commented 5 years ago

Yahoo seems to be expecting all the requests for attachments to be coming from their end. I see the referrer is being set

self.s.headers = {'Referer': self.BASE_URI}

But this doesn't seem make it work. If go to the image in my browser though while logged in and then run the script it'll work for that 1 file and then fail on the next one.

  File "C:\Python27\lib\site-packages\requests\models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://xa.yimg.com/kq/groups/V_xxxxxx./or/xxxxxxx/name/image.jpg

Any ideas on what we might be able to do here allow the attachments to be pulled?

abney317 commented 5 years ago

Applying the same logic that @laanwj did for the download_file to the get_file function to just keep trying if it fails worked perfectly.

the-solipsist commented 5 years ago

I've submitted a pull request for this: https://github.com/IgnoredAmbience/yahoo-group-archiver/pull/10