但是到爬取到一定的时候,还是会出现disconnect的错误。
done : https://wizardforcel.gitbooks.io/python-quant-uqer/content/81.html
Traceback (most recent call last):
File "gitbook.py", line 5, in
Gitbook2PDF(url).run()
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 202, in run
loop.run_until_complete(self.crawl_main_content(content_urls))
File "d:\ProgramData\Anaconda3\envs\python36\lib\asyncio\base_events.py", line 468, in run_until_complete
return future.result()
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 224, in crawl_main_content
await asyncio.gather(*tasks)
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 246, in gettext
metatext = await request(url, self.headers)
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 21, in request
async with session.get(url, headers=headers, timeout=timeout) as resp:
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\client.py", line 1005, in aenter
self._resp = await self._coro
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\client.py", line 497, in _request
await resp.start(conn)
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\client_reqrep.py", line 844, in start
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\streams.py", line 588, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: None
使用命令:Python版本 3.6.5 python gitbook.py https://wizardforcel.gitbooks.io/python-quant-uqer/content/ 根据爬取的日志,定位代码,优化了一个地方:增加了休眠时间 async def gettext(self, index, url, level, title): ''' return path's html '''
但是到爬取到一定的时候,还是会出现disconnect的错误。 done : https://wizardforcel.gitbooks.io/python-quant-uqer/content/81.html Traceback (most recent call last): File "gitbook.py", line 5, in
Gitbook2PDF(url).run()
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 202, in run
loop.run_until_complete(self.crawl_main_content(content_urls))
File "d:\ProgramData\Anaconda3\envs\python36\lib\asyncio\base_events.py", line 468, in run_until_complete
return future.result()
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 224, in crawl_main_content
await asyncio.gather(*tasks)
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 246, in gettext
metatext = await request(url, self.headers)
File "E:\code\pythonCode\thirdparty\gitbook2pdf-master\gitbook2pdf\gitbook2pdf.py", line 21, in request
async with session.get(url, headers=headers, timeout=timeout) as resp:
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\client.py", line 1005, in aenter
self._resp = await self._coro
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\client.py", line 497, in _request
await resp.start(conn)
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\client_reqrep.py", line 844, in start
File "d:\ProgramData\Anaconda3\envs\python36\lib\site-packages\aiohttp\streams.py", line 588, in read await self._waiter aiohttp.client_exceptions.ServerDisconnectedError: None