IgnoredAmbience / yahoo-group-archiver

Scrapes and archives a Yahoo groups email archives, photo galleries and file contents using the non-public API
MIT License
93 stars 46 forks source link

Aborting after read timeout #69

Closed n4mwd closed 4 years ago

n4mwd commented 4 years ago

The file name sanitizer seems to be working OK, but now I'm getting this error after about 2 hours of downloading.

2019-10-26 00:19:03.203 Eastern Daylight Time INFO archive_photos Fetching photo 'Wind shield' (100/249) 2019-10-26 00:19:03.483 Eastern Daylight Time DEBUG urllib3.connectionpool https://xa.yimg.com:443 "GET /kq/groups/.WmVSr7uctf4.5HK/hr/UIjX87DteNjG_NL 64.Ah/name/n_a HTTP/1.1" 200 87677 2019-10-26 00:19:03.812 Eastern Daylight Time DEBUG urllib3.connectionpool Resetting dropped connection: groups.yahoo.com Traceback (most recent call last): File "C:\Python27\Scripts\yahoo.py", line 698, in archive_photos(yga) File "C:\Python27\Scripts\yahoo.py", line 276, in archive_photos photos = yga.albums(a['albumId'], start=page*100, count=100) File "C:\Python27\Scripts\yahoogroupsapi.py", line 101, in get_json r = self.s.get(uri, params=opts, verify=VERIFY_HTTPS, allow_redirects=False, timeout=15) File "C:\Python27\lib\site-packages\requests\sessions.py", line 546, in get return self.request('GET', url, kwargs) File "C:\Python27\lib\site-packages\requests\sessions.py", line 533, in request resp = self.send(prep, send_kwargs) File "C:\Python27\lib\site-packages\requests\sessions.py", line 646, in send r = adapter.send(request, **kwargs) File "C:\Python27\lib\site-packages\requests\adapters.py", line 529, in send raise ReadTimeout(e, request=request) requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='groups.yahoo.com', port=443): Read timed out. (read timeout=15)

Seems like it shouldn't give up so easy. Maybe logg it and continue to the next one. Then if like 5 files in row fail, then abort.

Also, it would be nice to be able to skip downloading files that are already on the computer.

IgnoredAmbience commented 4 years ago

This was fixed with #28