Aqua-Dream / Tieba_Spider

百度贴吧爬虫(基于scrapy和mysql)
389 stars 116 forks source link

关于win10上的使用? #22

Closed barnett2010 closed 4 years ago

barnett2010 commented 4 years ago

已安装python37 mysql57

原来是我操作有误,依赖安装错了。 已经在运行,如有问题,再来请教大佬

我也喜欢单机游戏

barnett2010 commented 4 years ago

Crawling page 2023... Crawling page 2024... 2020-03-11 21:32:41 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://tieba.baidu.com/p/3564343563?pn=5098> (failed 3 times): User timeo ut caused connection failure: Getting https://tieba.baidu.com/p/3564343563?pn=50 98 took longer than 180.0 seconds.. 2020-03-11 21:32:42 [scrapy.core.scraper] ERROR: Error downloading <GET https:// tieba.baidu.com/p/3564343563?pn=5098> Traceback (most recent call last): File "c:\python37\lib\site-packages\twisted\internet\defer.py", line 1416, in _inlineCallbacks result = result.throwExceptionIntoGenerator(g) File "c:\python37\lib\site-packages\twisted\python\failure.py", line 512, in t hrowExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File "c:\python37\lib\site-packages\scrapy\core\downloader\middleware.py", lin e 42, in process_request defer.returnValue((yield downloadfunc(request=request, spider=spider))) File "c:\python37\lib\site-packages\twisted\internet\defer.py", line 654, in runCallbacks current.result = callback(current.result, *args, **kw) File "c:\python37\lib\site-packages\scrapy\core\downloader\handlers\http11.py" , line 377, in _cb_timeout raise TimeoutError("Getting %s took longer than %s seconds." % (url, timeout )) twisted.internet.error.TimeoutError: User timeout caused connection failure: Get ting https://tieba.baidu.com/p/3564343563?pn=5098 took longer than 180.0 seconds ..

barnett2010 commented 4 years ago

会有这样的提示,然后脚本停止运行

Aqua-Dream commented 4 years ago

这是你网断了吧,错误提示写的是网络180秒无响应

barnett2010 commented 4 years ago

@Aqua-Dream
感谢大佬的回复。我又测试了几个贴吧,总页数有1万多页,基本都是下载到2000多时,就会出现上面的提示。

我猜测有个可能,现在贴吧限制了,17年前的帖子都不让看了。是不是这个引起的出错。。


还有一个,想要一个这样的功能。 如果有几十个贴吧想备份, 能不能同时写进config里,然后一次性下载完成。 或者写一个贴吧list.txt。config调用这个文件。 list里 按 吧名 数据库名 一行一个。

Aqua-Dream commented 4 years ago

你写个batch后台运行不就行了