Gerapy / GerapyPlaywright

Downloader Middleware to support Playwright in Scrapy & Gerapy
106 stars 21 forks source link

你好,程序报错,帮忙看看哪里有问题 #1

Open superniao666 opened 2 years ago

superniao666 commented 2 years ago

程序时原封不动的运行 我的scrapy版本时2.5 ` 2021-12-29 14:10:14 [scrapy.utils.log] INFO: Scrapy 2.5.0 started (bot: example) 2021-12-29 14:10:14 [scrapy.utils.log] INFO: Versions: lxml 4.6.3.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.2.0, Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k 25 Mar 2021), cryptography 3.4.7, Platform Windows-10-10.0.19041-SP0 2021-12-29 14:10:14 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor 2021-12-29 14:10:14 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events._WindowsSelectorEventLoop 2021-12-29 14:10:14 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'example', 'CONCURRENT_REQUESTS': 5, 'NEWSPIDER_MODULE': 'example.spiders', 'RETRY_HTTP_CODES': [403, 500, 502, 503, 504], 'SPIDER_MODULES': ['example.spiders']} ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2021-12-29 14:10:14 [asyncio] ERROR: Task exception was never retrieved future: <Task finished name='Task-2' coro=<Connection.run() done, defined at D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_connection.py:174> exception=NotImplementedError()> Traceback (most recent call last): File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_connection.py", line 181, in run await self._transport.connect() File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_transport.py", line 132, in connect raise exc File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_transport.py", line 120, in connect self._proc = await asyncio.create_subprocess_exec( File "D:\ProgramFiles\Python38\lib\asyncio\subprocess.py", line 236, in create_subprocess_exec transport, protocol = await loop.subprocess_exec( File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 1630, in subprocess_exec transport = await self._make_subprocess_transport( File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 491, in _make_subprocess_transport raise NotImplementedError NotImplementedError Unhandled error in Deferred: 2021-12-29 14:10:14 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last): File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\crawler.py", line 192, in crawl return self._crawl(crawler, *args, kwargs) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\crawler.py", line 196, in _crawl d = crawler.crawl(*args, *kwargs) File "D:\python_venv\venv_pachong_work\lib\site-packages\twisted\internet\defer.py", line 1656, in unwindGenerator return _cancellableInlineCallbacks(gen) File "D:\python_venv\venv_pachong_work\lib\site-packages\twisted\internet\defer.py", line 1571, in _cancellableInlineCallbacks _inlineCallbacks(None, g, status) --- --- File "D:\python_venv\venv_pachong_work\lib\site-packages\twisted\internet\defer.py", line 1445, in _inlineCallbacks result = current_context.run(g.send, result) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\crawler.py", line 87, in crawl self.engine = self._create_engine() File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\crawler.py", line 101, in _createengine return ExecutionEngine(self, lambda : self.stop()) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\core\engine.py", line 69, in init self.downloader = downloader_cls(crawler) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\core\downloader__init.py", line 83, in init__ self.middleware = DownloaderMiddlewareManager.from_crawler(crawler) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler return cls.from_settings(crawler.settings, crawler) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\middleware.py", line 35, in from_settings mw = create_instance(mwcls, settings, crawler) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\utils\misc.py", line 166, in create_instance instance = objcls.from_crawler(crawler, args, kwargs) File "D:\python_venv\venv_pachong_work\lib\site-packages\gerapy_playwright\downloadermiddlewares.py", line 94, in from_crawler playwright_installed = is_playwright_installed() File "D:\python_venv\venv_pachong_work\lib\site-packages\gerapy_playwright\utils.py", line 41, in wrapper return asyncio.get_event_loop().run_until_complete(res) File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete return future.result() File "D:\python_venv\venv_pachong_work\lib\site-packages\gerapy_playwright\utils.py", line 52, in is_playwright_installed playwright = await async_playwright().start() File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright\async_api_context_manager.py", line 51, in start return await self.aenter() File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright\async_api_context_manager.py", line 46, in aenter playwright = AsyncPlaywright(next(iter(done)).result()) File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_connection.py", line 181, in run await self._transport.connect() File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_transport.py", line 132, in connect raise exc File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_transport.py", line 120, in connect self._proc = await asyncio.create_subprocess_exec( File "D:\ProgramFiles\Python38\lib\asyncio\subprocess.py", line 236, in create_subprocess_exec transport, protocol = await loop.subprocess_exec( File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 1630, in subprocess_exec transport = await self._make_subprocess_transport( File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 491, in _make_subprocess_transport raise NotImplementedError builtins.NotImplementedError:

2021-12-29 14:10:14 [twisted] CRITICAL: Traceback (most recent call last): File "D:\python_venv\venv_pachong_work\lib\site-packages\twisted\internet\defer.py", line 1445, in _inlineCallbacks result = current_context.run(g.send, result) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\crawler.py", line 87, in crawl self.engine = self._create_engine() File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\crawler.py", line 101, in _createengine return ExecutionEngine(self, lambda : self.stop()) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\core\engine.py", line 69, in init self.downloader = downloader_cls(crawler) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\core\downloader__init.py", line 83, in init self.middleware = DownloaderMiddlewareManager.from_crawler(crawler) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\middleware.py", line 53, in from_crawler return cls.from_settings(crawler.settings, crawler) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\middleware.py", line 35, in from_settings mw = create_instance(mwcls, settings, crawler) File "D:\python_venv\venv_pachong_work\lib\site-packages\scrapy\utils\misc.py", line 166, in create_instance instance = objcls.from_crawler(crawler, *args, **kwargs) File "D:\python_venv\venv_pachong_work\lib\site-packages\gerapy_playwright\downloadermiddlewares.py", line 94, in from_crawler playwright_installed = is_playwright_installed() File "D:\python_venv\venv_pachong_work\lib\site-packages\gerapy_playwright\utils.py", line 41, in wrapper return asyncio.get_event_loop().run_until_complete(res) File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete return future.result() File "D:\python_venv\venv_pachong_work\lib\site-packages\gerapy_playwright\utils.py", line 52, in is_playwright_installed playwright = await async_playwright().start() File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright\async_api_context_manager.py", line 51, in start return await self.aenter() File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright\async_api_context_manager.py", line 46, in aenter__ playwright = AsyncPlaywright(next(iter(done)).result()) File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_connection.py", line 181, in run await self._transport.connect() File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_transport.py", line 132, in connect raise exc File "D:\python_venv\venv_pachong_work\lib\site-packages\playwright_impl_transport.py", line 120, in connect self._proc = await asyncio.create_subprocess_exec( File "D:\ProgramFiles\Python38\lib\asyncio\subprocess.py", line 236, in create_subprocess_exec transport, protocol = await loop.subprocess_exec( File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 1630, in subprocess_exec transport = await self._make_subprocess_transport( File "D:\ProgramFiles\Python38\lib\asyncio\base_events.py", line 491, in _make_subprocess_transport raise NotImplementedError NotImplementedError 把settings.py中的中间件去掉,就不报错,但是不返回所需内容 python DOWNLOADER_MIDDLEWARES = { 'gerapy_playwright.downloadermiddlewares.PlaywrightMiddleware': 543, }` 帮忙看看我这里是哪里出问题了呢, 谢谢

Germey commented 2 years ago

我刚才测试了下,Mac 下稳定运行,我怀疑是 Windows 上 Playwright 某个方法实现导致的,我再排查下哈。

Germey commented 2 years ago

临时增加了一个解决方案,升级该包到 0.2.1 版本,然后 settings.py 里面增加该配置:

GERAPY_CHECK_PLAYWRIGHT_INSTALLED = False

应该可以解决问题。

Germey commented 2 years ago

或更新到 0.2.2,不用设置如上配置。

Germey commented 2 years ago

看起来此问题目前无解,See https://github.com/scrapy-plugins/scrapy-playwright/issues/7#issuecomment-808824121

superniao666 commented 2 years ago

你好 ,老师,scrapy-playwright这个怎么设置https验证的问题呢,playwright设置为'ignore_https_errors': True, 请问在老师写的这个库里面,怎么设置呢,看源码也没有找到设置的地方,麻烦指导下,谢谢

------------------ 原始邮件 ------------------ 发件人: "Gerapy/GerapyPlaywright" @.>; 发送时间: 2022年1月9日(星期天) 下午3:21 @.>; @.**@.>; 主题: Re: [Gerapy/GerapyPlaywright] 你好,程序报错,帮忙看看哪里有问题 (Issue #1)

看起来此问题目前无解,See scrapy-plugins/scrapy-playwright#7 (comment)

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>

wAnFen1017 commented 2 years ago

windows下无法运行吗

lcuiandlc commented 1 year ago

win10 下无法运行,按文档说3.8+才会有这个问题,我用3.7版本试了。还是一样的错误