Spark-NF / twitter_media_downloader

Twitter media downloader.
Apache License 2.0
293 stars 52 forks source link

ValueError: invalid literal for int() with base 16: b'' #17

Open God-damnit-all opened 3 years ago

God-damnit-all commented 3 years ago

I'm occasionally getting this error, and I'm not sure why. Perhaps it's a failed download, and there's no catch and retry logic?

notrealfilename.jpg: ok
placeholder.jpg: ok
anotherfakefile.mp4: ok
Traceback (most recent call last):
  File "C:\Python38\lib\site-packages\urllib3\response.py", line 696, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python38\lib\site-packages\urllib3\response.py", line 436, in _error_catcher
    yield
  File "C:\Python38\lib\site-packages\urllib3\response.py", line 763, in read_chunked
    self._update_chunk_length()
  File "C:\Python38\lib\site-packages\urllib3\response.py", line 700, in _update_chunk_length
    raise httplib.IncompleteRead(line)
http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python38\lib\site-packages\requests\models.py", line 751, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "C:\Python38\lib\site-packages\urllib3\response.py", line 571, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "C:\Python38\lib\site-packages\urllib3\response.py", line 792, in read_chunked
    self._original_response.close()
  File "C:\Python38\lib\contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Python38\lib\site-packages\urllib3\response.py", line 454, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "twitter_media_downloader.py", line 38, in <module>
    download(results, outputDir, False, True)
  File "D:\twitter_media_downloader\src\downloader.py", line 40, in download
    r = requests.get(url, stream=stream)
  File "C:\Python38\lib\site-packages\requests\api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Python38\lib\site-packages\requests\api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Python38\lib\site-packages\requests\sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Python38\lib\site-packages\requests\sessions.py", line 685, in send
    r.content
  File "C:\Python38\lib\site-packages\requests\models.py", line 829, in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
  File "C:\Python38\lib\site-packages\requests\models.py", line 754, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
God-damnit-all commented 3 years ago

It could also be related to when something seems to be broken on Twitter's side of things. Notice how this tweet's media is all blank: https://twitter.com/alcopopstar/status/1288270106032660480

Edit: Disregard that, I have four 0 byte files for that tweet, so I guess that's actually working correctly and isn't related to the above issue. It would be good to refuse to download those files at all though, or delete them after the script sees that they're empty, so perhaps a retry later would yield working files if it's a temporary outage on one of their cloud servers.

God-damnit-all commented 3 years ago

I managed to figure out the tweet it was getting hung up on.

https://twitter.com/homosenpais/status/758860339119128576

Apparently this is the issue: https://github.com/psf/requests/issues/4248#issuecomment-510018853

There seem to be proposed fix it, but most of them involve editing the library itself...