8W9aG / scrapy-tor-downloader

Scrapy middleware with TOR support for more robust scrapers or anonymous scraping.
MIT License
6 stars 1 forks source link

Error `gzip.BadGzipFile` #1

Open NicolasMICAUX opened 1 year ago

NicolasMICAUX commented 1 year ago

Using the library with a very simple Spider. I'm always getting this error

2022-11-17 11:58:57 [scrapy.core.scraper] ERROR: Error downloading <GET https://xxxxxxxxxxxxxxxxxxxxxxxxxx>
Traceback (most recent call last):
  File "/xxxxxxxxxxxxx/python3.10/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "/xxxxxxxxxxxxx/venv/lib/python3.10/site-packages/scrapy/core/downloader/middleware.py", line 60, in process_response
    response = yield deferred_from_coro(method(request=request, response=response, spider=spider))
  File "/xxxxxxxxxxxxx/venv/lib/python3.10/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 62, in process_response
    decoded_body = self._decode(response.body, encoding.lower())
  File "/xxxxxxxxxxxxx/venv/lib/python3.10/site-packages/scrapy/downloadermiddlewares/httpcompression.py", line 82, in _decode
    body = gunzip(body)
  File "/xxxxxxxxxxxxx/venv/lib/python3.10/site-packages/scrapy/utils/gz.py", line 27, in gunzip
    chunk = f.read1(8196)
  File "/usr/lib/python3.10/gzip.py", line 314, in read1
    return self._buffer.read1(size)
  File "/usr/lib/python3.10/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/usr/lib/python3.10/gzip.py", line 488, in read
    if not self._read_gzip_header():
  File "/usr/lib/python3.10/gzip.py", line 436, in _read_gzip_header
    raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'\n\n')
NicolasMICAUX commented 1 year ago

I tried COMPRESSION_ENABLED=False and changing priority of scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware, without success