Gerapy / GerapyPlaywright

Downloader Middleware to support Playwright in Scrapy & Gerapy
106 stars 24 forks source link

Error: <twisted.python.failure.Failure builtins.OSError: Not a gzipped file (b'<!')> #3

Closed Ian4869 closed 2 years ago

Ian4869 commented 2 years ago

@Germey 你好,当存在 middleware 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 时 使用PlaywrightRequest 会报题目所示错误

Germey commented 2 years ago

你好,可以提供下如下信息吗:

Ian4869 commented 2 years ago

macOS Mojave 10.14.6 python 3.7.10 scrapy 2.5.1 Gerapy Playwright 0.2.0 测试网址 http://www.jggw.suzhou.gov.cn/


meta_info = {
    "origin_url": url,
    "job_id": self.job_id,
    "source": "",
    "AJAX": self.ajax,
    "depth": 0,
    "internal": True,
    "source_tag": "",
    "proxy_strategy": self.proxy_strategy
}

yield PlaywrightRequest(url, callback=self.parse_page, errback=self.proc_error, meta=meta_info)```
zhutuo commented 2 years ago

`

改成如下,可解决问题

            # Necessary to bypass the compression middleware (?)
            headers = response.headers
            headers.pop('content-encoding', None)
            headers.pop('Content-Encoding', None)
            response = HtmlResponse(
                page.url,
                status=response.status,
                headers=headers,
                body=content,
                encoding='utf-8',
                request=request
            )
            return response

`

Germey commented 2 years ago

@zhutuo 感谢,我修复一下

Germey commented 2 years ago

Fixed in 0.2.3, please have a try.