Open superniao666 opened 2 years ago
我刚才测试了下,Mac 下稳定运行,我怀疑是 Windows 上 Playwright 某个方法实现导致的,我再排查下哈。
临时增加了一个解决方案,升级该包到 0.2.1 版本,然后 settings.py 里面增加该配置:
GERAPY_CHECK_PLAYWRIGHT_INSTALLED = False
应该可以解决问题。
或更新到 0.2.2,不用设置如上配置。
你好 ,老师,scrapy-playwright这个怎么设置https验证的问题呢,playwright设置为'ignore_https_errors': True, 请问在老师写的这个库里面,怎么设置呢,看源码也没有找到设置的地方,麻烦指导下,谢谢
------------------ 原始邮件 ------------------ 发件人: "Gerapy/GerapyPlaywright" @.>; 发送时间: 2022年1月9日(星期天) 下午3:21 @.>; @.**@.>; 主题: Re: [Gerapy/GerapyPlaywright] 你好,程序报错,帮忙看看哪里有问题 (Issue #1)
看起来此问题目前无解,See scrapy-plugins/scrapy-playwright#7 (comment)
— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>
windows下无法运行吗
win10 下无法运行,按文档说3.8+才会有这个问题,我用3.7版本试了。还是一样的错误
程序时原封不动的运行 我的scrapy版本时2.5 ` 2021-12-29 14:10:14 [scrapy.utils.log] INFO: Scrapy 2.5.0 started (bot: example) 2021-12-29 14:10:14 [scrapy.utils.log] INFO: Versions: lxml 4.6.3.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.2.0, Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k 25 Mar 2021), cryptography 3.4.7, Platform Windows-10-10.0.19041-SP0 2021-12-29 14:10:14 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2021-12-29 14:10:14 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop 2021-12-29 14:10:14 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'example', 'CONCURRENT_REQUESTS': 5, 'NEWSPIDER_MODULE': 'example.spiders', 'RETRY_HTTP_CODES': [403, 500, 502, 503, 504], 'SPIDER_MODULES': ['example.spiders']} ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2021-12-29 14:10:14 [asyncio] ERROR: Task exception was never retrieved future: <Task finished name='Task-2' coro=<Connection.run() done, defined at D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_connection.py:174> exception=NotImplementedError()> Traceback (most recent call last): File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_connection.py", line 181, in run await self._transport.connect() File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_transport.py", line 132, in connect raise exc File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_transport.py", line 120, in connect self._proc = await asyncio.create_subprocess_exec( File "D:\ProgramFiles\Python38\lib\asyncio\subprocess.py", line 236, in create_subprocess_exec transport, protocol = await loop.subprocess_exec( File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 1630, in subprocess_exec transport = await self._make_subprocess_transport( File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 491, in _make_subprocess_transport raise NotImplementedError NotImplementedError Unhandled error in Deferred: 2021-12-29 14:10:14 [twisted] CRITICAL: Unhandled error in Deferred:
Traceback (most recent call last): File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\crawler.py", line 192, in crawl return self._crawl(crawler, *args, kwargs) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\crawler.py", line 196, in _crawl d = crawler.crawl(*args, *kwargs) File "D:\python_venv\venv_pachong_work\lib\site-packages\twisted\internet\defer.py", line 1656, in unwindGenerator return _cancellableInlineCallbacks(gen) File "D:\python_venv\venv_pachong_work\lib\site-packages\twisted\internet\defer.py", line 1571, in _cancellableInlineCallbacks _inlineCallbacks(None, g, status) --- ---
File "D:\python_venv\venv_pachong_work\lib\site-packages\twisted\internet\defer.py", line 1445, in _inlineCallbacks
result = current_context.run(g.send, result)
File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\crawler.py", line 87, in crawl
self.engine = self._create_engine()
File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\crawler.py", line 101, in _createengine
return ExecutionEngine(self, lambda : self.stop())
File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\core\engine.py", line 69, in init
self.downloader = downloader_cls(crawler)
File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\core\downloader__init.py", line 83, in init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\middleware.py", line 35, in from_settings
mw = create_instance(mwcls, settings, crawler)
File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\utils\misc.py", line 166, in create_instance
instance = objcls.from_crawler(crawler, args, kwargs)
File "D:\python_venv\venv_pachong_work\lib\site-packages\gerapy_playwright\downloadermiddlewares.py", line 94, in from_crawler
playwright_installed = is_playwright_installed()
File "D:\python_venv\venv_pachong_work\lib\site-packages\gerapy_playwright\utils.py", line 41, in wrapper
return asyncio.get_event_loop().run_until_complete(res)
File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "D:\python_venv\venv_pachong_work\lib\site-packages\gerapy_playwright\utils.py", line 52, in is_playwright_installed
playwright = await async_playwright().start()
File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright\async_api_context_manager.py", line 51, in start
return await self.aenter()
File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright\async_api_context_manager.py", line 46, in aenter
playwright = AsyncPlaywright(next(iter(done)).result())
File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_connection.py", line 181, in run
await self._transport.connect()
File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_transport.py", line 132, in connect
raise exc
File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_transport.py", line 120, in connect
self._proc = await asyncio.create_subprocess_exec(
File "D:\ProgramFiles\Python38\lib\asyncio\subprocess.py", line 236, in create_subprocess_exec
transport, protocol = await loop.subprocess_exec(
File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 1630, in subprocess_exec
transport = await self._make_subprocess_transport(
File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 491, in _make_subprocess_transport
raise NotImplementedError
builtins.NotImplementedError:
2021-12-29 14:10:14 [twisted] CRITICAL: Traceback (most recent call last): File "D:\python_venv\venv_pachong_work\lib\site-packages\twisted\internet\defer.py", line 1445, in _inlineCallbacks result = current_context.run(g.send, result) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\crawler.py", line 87, in crawl self.engine = self._create_engine() File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\crawler.py", line 101, in _createengine return ExecutionEngine(self, lambda : self.stop()) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\core\engine.py", line 69, in init self.downloader = downloader_cls(crawler) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\core\downloader__init.py", line 83, in init self.middleware = DownloaderMiddlewareManager.from_crawler(crawler) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler return cls.from_settings(crawler.settings, crawler) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\middleware.py", line 35, in from_settings mw = create_instance(mwcls, settings, crawler) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\utils\misc.py", line 166, in create_instance instance = objcls.from_crawler(crawler, *args, **kwargs) File "D:\python_venv\venv_pachong_work\lib\site-packages\gerapy_playwright\downloadermiddlewares.py", line 94, in from_crawler playwright_installed = is_playwright_installed() File "D:\python_venv\venv_pachong_work\lib\site-packages\gerapy_playwright\utils.py", line 41, in wrapper return asyncio.get_event_loop().run_until_complete(res) File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete return future.result() File "D:\python_venv\venv_pachong_work\lib\site-packages\gerapy_playwright\utils.py", line 52, in is_playwright_installed playwright = await async_playwright().start() File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright\async_api_context_manager.py", line 51, in start return await self.aenter() File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright\async_api_context_manager.py", line 46, in aenter__ playwright = AsyncPlaywright(next(iter(done)).result()) File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_connection.py", line 181, in run await self._transport.connect() File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_transport.py", line 132, in connect raise exc File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_transport.py", line 120, in connect self._proc = await asyncio.create_subprocess_exec( File "D:\ProgramFiles\Python38\lib\asyncio\subprocess.py", line 236, in create_subprocess_exec transport, protocol = await loop.subprocess_exec( File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 1630, in subprocess_exec transport = await self._make_subprocess_transport( File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 491, in _make_subprocess_transport raise NotImplementedError NotImplementedError
把settings.py中的中间件去掉,就不报错,但是不返回所需内容
python DOWNLOADER_MIDDLEWARES = { 'gerapy_playwright.downloadermiddlewares.PlaywrightMiddleware': 543, }` 帮忙看看我这里是哪里出问题了呢, 谢谢