crawlbase / proxycrawl-python

ProxyCrawl Python library for scraping and crawling
https://proxycrawl.com
Apache License 2.0
60 stars 19 forks source link

OSError: Not a gzipped file (b'{"') with format='json' #5

Closed aorzh closed 5 years ago

aorzh commented 5 years ago

Hi there, I'm getting this error time-by-time (with ONE url) when do request:

...
self.api = ProxyCrawlAPI({'token': api_token})
...

 def get_response(self, url, page_wait=1000, resp_format='json'):
        response = self.api.get(url, options={
            'user_agent': self.get_ua(),
            'page_wait': page_wait,
            'format': resp_format
        })....

Here is traceback:

 products_json = self.shopify_parser.parse_result(self.normal_crawler.get_response(urls.get('shopify')))
  File "/Users/alex/Python-scripts/ec_scrape/parsing/parsers/base.py", line 31, in get_response
    'format': resp_format
  File "/Users/alex/Python-scripts/ec_scrape/venv/lib/python3.6/site-packages/proxycrawl/proxycrawl_api.py", line 44, in get
    return self.request(url, None, options)
  File "/Users/alex/Python-scripts/ec_scrape/venv/lib/python3.6/site-packages/proxycrawl/proxycrawl_api.py", line 67, in request
    self.response['body'] = self.decompressBody()
  File "/Users/alex/Python-scripts/ec_scrape/venv/lib/python3.6/site-packages/proxycrawl/proxycrawl_api.py", line 87, in decompressBody
    return body_gzip.read()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py", line 276, in read
    return self._buffer.read(size)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py", line 463, in read
    if not self._read_gzip_header():
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py", line 411, in _read_gzip_header
    raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'{"')

UPD: Just checked with link from example and same error https://github.com/proxycrawl/proxycrawl-python#get-requests

UPD2: this is only with format='json'

crawlbase commented 5 years ago

Thanks for reporting. This is now fixed in the latest release. Please update to 2.0.3