drunkdream / weread-exporter

将微信读书中的书籍导出成epub、pdf、mobi等格式
1.03k stars 157 forks source link

有图片的章节会卡死下载不了 #52

Closed tofuvip closed 8 months ago

tofuvip commented 8 months ago

bookid:c3032820813ab8038g014ada 9-10 章节有图片就会卡死,超时后重试,循环如此

tofuvip commented 8 months ago

[2023-11-06 18:32:15,044][INFO][WeReadExporter] Check chapter 2/版权信息 [2023-11-06 18:32:15,045][INFO][WeReadExporter] Check chapter 3/内容提要 [2023-11-06 18:32:15,046][INFO][WeReadExporter] Check chapter 4/对本书第1版的赞誉 [2023-11-06 18:32:15,046][INFO][WeReadExporter] Check chapter 5/第1版读者评价 [2023-11-06 18:32:15,047][INFO][WeReadExporter] Check chapter 6/第1版序 [2023-11-06 18:32:15,047][INFO][WeReadExporter] Check chapter 7/前言 [2023-11-06 18:32:15,048][INFO][WeReadExporter] Check chapter 8/致谢 [2023-11-06 18:32:15,048][INFO][WeReadExporter] Check chapter 9/第1章 管理路口的彷徨 [2023-11-06 18:32:15,048][INFO][WeReadExporter] Check chapter 10/1.1 迷茫:工程师有哪些发展路径 [2023-11-06 18:32:15,050][INFO][WeReadExporter] File cache\c3032820813ab8038g014ada\chapters\9-10.md not exist [2023-11-06 18:32:15,050][INFO][WeReadWebPage] Go to chapter 10 [2023-11-06 18:32:15,064][INFO][WeReadWebPage] Fetch url https://weread.qq.com/web/reader/c3032820813ab8038g014adakd3d322001ad3d9446802347 [2023-11-06 18:32:15,531][INFO][WeReadWebPage] Fetch url https://cdn.weread.qq.com/web/wpa.js [2023-11-06 18:32:15,540][INFO][WeReadWebPage] Fetch url https://res.mail.qq.com/node/wr/wrpage/style/images/independent/widget/common/avatar/Default.svg [2023-11-06 18:32:15,541][INFO][WeReadWebPage] Fetch url https://cdn.weread.qq.com/weread/cover/40/cpplatform_igsj3trmxnpzjt6apfnsnj/t6_cpplatform_igsj3trmxnpzjt6apfnsnj1690877148.jpg [2023-11-06 18:32:15,542][INFO][WeReadWebPage] Fetch url https://midas.gtimg.cn/midas/minipay_v2/jsapi/cashier.js [2023-11-06 18:32:15,543][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/css/app.02ecef75.css [2023-11-06 18:32:15,544][INFO][WeReadWebPage] Fetch url https://weread-1258476243.file.myqcloud.com/web/wrwebnjlogic/js/app.d129fdf7.js [2023-11-06 18:32:45,071][WARNING]Load chapter failed, close browser and retry [2023-11-06 18:32:45,071][INFO]terminate chrome process... [2023-11-06 18:32:45,071][ERROR]connection unexpectedly closed [2023-11-06 18:32:45,071][ERROR]Task exception was never retrieved future: <Task finished name='Task-286' coro=<Connection._async_send() done, defined at D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\connection.py:69> exception=InvalidStateError('invalid state')> Traceback (most recent call last): File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\websockets\legacy\protocol.py", line 979, in transfer_data await asyncio.shield(self._put_message_waiter) asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\connection.py", line 73, in _async_send await self.connection.send(msg) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\websockets\legacy\protocol.py", line 635, in send await self.ensure_open() File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\websockets\legacy\protocol.py", line 944, in ensure_open raise self.connection_closed_exc() websockets.exceptions.ConnectionClosedError: sent 1000 (OK); no close frame received

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\connection.py", line 79, in _async_send await self.dispose() File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\connection.py", line 170, in dispose await self._on_close() File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\connection.py", line 151, in _on_close cb.set_exception(_rewriteError( asyncio.exceptions.InvalidStateError: invalid state [2023-11-06 18:32:45,231][INFO][WeReadWebPage] Launch url https://weread.qq.com/web/bookDetail/c3032820813ab8038g014ada [2023-11-06 18:32:45,796][INFO]Browser listening on: ws://127.0.0.1:45893/devtools/browser/b02db451-b73f-47cd-97fc-8d325170082e [2023-11-06 18:32:46,063][INFO][WeReadWebPage] Current login user is xxx [2023-11-06 18:32:46,063][INFO][WeReadWebPage] Inject cookie wr_fp=xxx [2023-11-06 18:32:46,066][INFO][WeReadWebPage] Inject cookie wr_gid=xxx [2023-11-06 18:32:46,068][INFO][WeReadWebPage] Inject cookie wr_vid=xxx [2023-11-06 18:32:46,069][INFO][WeReadWebPage] Inject cookie wr_skey=xxx [2023-11-06 18:32:46,070][INFO][WeReadWebPage] Inject cookie wr_pf=xxx [2023-11-06 18:32:46,070][INFO][WeReadWebPage] Inject cookie wr_rt=xxx [2023-11-06 18:32:46,071][INFO][WeReadWebPage] Inject cookie wr_localvid=xxx [2023-11-06 18:32:46,072][INFO][WeReadWebPage] Inject cookie wr_name=xxx [2023-11-06 18:32:46,073][INFO][WeReadWebPage] Inject cookie wr_avatar=xxx [2023-11-06 18:32:46,074][INFO][WeReadWebPage] Inject cookie wr_gender=xxx Traceback (most recent call last): File "C:\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\python-dev\weread-exporter-main\weread_exporter__main__.py", line 147, in main() File "D:\python-dev\weread-exporter-main\weread_exporter__main__.py", line 143, in main loop.run_until_complete(async_main()) File "C:\Programs\Python\Python310\lib\asyncio\base_events.py", line 636, in run_until_complete self.run_forever() File "C:\Programs\Python\Python310\lib\asyncio\windows_events.py", line 321, in run_forever super().run_forever() File "C:\Programs\Python\Python310\lib\asyncio\base_events.py", line 603, in run_forever self._run_once() File "C:\Programs\Python\Python310\lib\asyncio\base_events.py", line 1871, in _run_once event_list = self._selector.select(timeout) File "C:\Programs\Python\Python310\lib\asyncio\windows_events.py", line 444, in select self._poll(timeout) File "C:\Programs\Python\Python310\lib\asyncio\windows_events.py", line 797, in _poll status = _overlapped.GetQueuedCompletionStatus(self._iocp, ms) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\launcher.py", line 153, in _close_process self._loop.run_until_complete(self.killChrome()) File "C:\Programs\Python\Python310\lib\asyncio\base_events.py", line 625, in run_until_complete self._check_running() File "C:\Programs\Python\Python310\lib\asyncio\base_events.py", line 584, in _check_running raise RuntimeError('This event loop is already running') RuntimeError: This event loop is already running [2023-11-06 18:32:53,383][INFO]terminate chrome process... [2023-11-06 18:32:53,560][ERROR]Task exception was never retrieved future: <Task finished name='Task-4' coro=<Connection._recv_loop() done, defined at D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\connection.py:53> exception=UnicodeEncodeError('gbk', '[https://weread.qq.com/web/reader/c3032820813ab8038g014adakd3d322001ad3d9446802347] fillText ▶ 0 1412.5 JSHandle@array\r\n', 93, 94, 'illegal multibyte sequence')> Traceback (most recent call last): File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\connection.py", line 61, in _recv_loop await self._on_message(resp) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\connection.py", line 143, in _on_message self._on_query(msg) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\connection.py", line 123, in _on_query session._on_message(params.get('message')) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\connection.py", line 276, in _on_message self.emit(obj.get('method'), obj.get('params')) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyee_base.py", line 115, in emit handled = self._call_handlers(event, args, kwargs) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyee_base.py", line 98, in _call_handlers self._emit_run(f, args, kwargs) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyee_base.py", line 83, in _emit_run f(*args, *kwargs) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\page.py", line 184, in client.on('Runtime.consoleAPICalled', lambda event: self._onConsoleAPI(event)) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\page.py", line 692, in _onConsoleAPI self._addConsoleMessage(event['type'], values) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyppeteer\page.py", line 729, in _addConsoleMessage self.emit(Page.Events.Console, message) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyee_base.py", line 115, in emit handled = self._call_handlers(event, args, kwargs) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyee_base.py", line 98, in _call_handlers self._emit_run(f, args, kwargs) File "D:\python-dev\weread-exporter-main\venv\lib\site-packages\pyee_base.py", line 83, in _emit_run f(args, **kwargs) File "D:\python-dev\weread-exporter-main\weread_exporter\webpage.py", line 252, in handle_log fp.write("[%s] %s\n" % (self._url, message.text)) UnicodeEncodeError: 'gbk' codec can't encode character '\u25b6' in position 93: illegal multibyte sequence [2023-11-06 18:32:53,562][ERROR]Task was destroyed but it is pending! task: <Task pending name='Task-190' coro=<WeReadWebPage._handle_request() running at D:\python-dev\weread-exporter-main\weread_exporter\webpage.py:355> wait_for=> sys:1: RuntimeWarning: coroutine 'Launcher.killChrome' was never awaited