IgnoredAmbience / yahoo-group-archiver

Scrapes and archives a Yahoo groups email archives, photo galleries and file contents using the non-public API
MIT License
93 stars 46 forks source link

Number of retries and waiting time if a file is damaged on the server #125

Open aairfccha opened 4 years ago

aairfccha commented 4 years ago

I am in the process of downloading a group and the process effectively stops dead for a few minutes when the downloader tries again and again to load the same file (which is broken on the server already) with generous waiting time in between. It gives up after 15 attempts but this still costs quite a bit of time. Is it possible to either reduce waiting time or continue with the next file and repeat the "difficult" files at the end? This would greatly speed up extraction of the easy files.

Pablo2m commented 4 years ago

Hello You need a edit the file "yahoogroupsapi.py", In the line 87 " def init(self, group, cookie_jar=None, headers={}, min_delay=0, retries=7):" change the number of retries, personally i use 7

To increase the download speed I download several groups simultaneously

aairfccha commented 4 years ago

Thanks, fortunately only a handful of files were broken, so the download finished in a reasonably finite time anyway. I actually did run downloads of all my groups in parallel (a grand total of three), but since one took a lot longer than the others, there only was so much parallelisation possible.