dataabc / weibo-search

获取微博搜索结果信息,搜索即可以是微博关键词搜索,也可以是微博话题搜索
1.66k stars 369 forks source link

爬取大约四个小时后就会报错 #143

Open Fanniesiu opened 2 years ago

Fanniesiu commented 2 years ago

大大你好,我每次爬取一天的数据,第一次运行了大约7个小时,是ok的,但之后每次大约四个小时之后就会报错: 2022-01-15 14:09:36 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://s.weibo.com/weibo?q=%E8%82%96%E6%88%98&typeall=1&suball=1&timescope=custom:2020-02-28-13:2020-02-28-14&page=28> (failed 3 times): [] 2022-01-15 14:09:36 [scrapy.core.scraper] ERROR: Error downloading <GET https://s.weibo.com/weibo?q=%E8%82%96%E6%88%98&typeall=1&suball=1&timescope=custom:2020-02-28-13:2020-02-28-14&page=28> Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/twisted/internet/defer.py", line 47, in run return f(*args, *kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request return (yield download_func(request=request, spider=spider)) twisted.web._newclient.ResponseNeverReceived: [] 2022-01-15 14:09:55 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://s.weibo.com/weibo?q=%E8%82%96%E6%88%98&typeall=1&suball=1&timescope=custom:2020-02-28-12:2020-02-28-13&page=28> (failed 3 times): [] 2022-01-15 14:09:55 [scrapy.core.scraper] ERROR: Error downloading <GET https://s.weibo.com/weibo?q=%E8%82%96%E6%88%98&typeall=1&suball=1&timescope=custom:2020-02-28-12:2020-02-28-13&page=28> Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/twisted/internet/defer.py", line 47, in run return f(args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 44, in process_request return (yield download_func(request=request, spider=spider)) twisted.web._newclient.ResponseNeverReceived: []

同时会有好几个爬取报错,这里选取的是其中一个,请问可能是什么原因呢?

dataabc commented 2 years ago

可能是微博的限制吧,修改settings.py的DOWNLOAD_DELAY,增大数值可能会好一点。

Fanniesiu commented 2 years ago

把download_delay从15改到30,运行六小时没有报错~谢谢大大~