Henryhaohao / Wenshu_Spider

:rainbow:Wenshu_Spider-Scrapy框架爬取中国裁判文书网案件数据(2019-1-9最新版)
http://wenshu.court.gov.cn/
MIT License
191 stars 71 forks source link

每次爬到一定数量后就爬不了了,不知道其他人有没有这样的问题 #9

Open xs14331309 opened 5 years ago

xs14331309 commented 5 years ago

2019-01-02 15:08:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:09:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:10:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:11:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:12:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:13:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:14:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:15:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:16:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:17:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:18:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:19:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:20:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:21:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:22:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:23:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:24:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min) 2019-01-02 15:25:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)

fjqwonders commented 5 years ago

爬了200以后也不能爬了,

fjqwonders commented 5 years ago

2019-01-02 16:42:32 [scrapy.core.scraper] ERROR: Spider error processing <GET http://wenshu.court.gov.cn/CreateContentJS/CreateContentJS.aspx?DocID=ddb5d9fb-2022-472e-aa17-b4f91e537da8> (referer: http://wenshu.court.gov.cn/List/ListContent) Traceback (most recent call last): File "d:\programdata\anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback yield next(it) File "d:\programdata\anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 30, in process_spider_output for x in result: File "d:\programdata\anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in return (_set_referer(r) for r in result or ()) File "d:\programdata\anaconda3\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in return (r for r in result or () if _filter(r)) File "d:\programdata\anaconda3\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in return (r for r in result or () if _filter(r)) File "D:\ProgramData\Github\Wenshu_Spider-master\Wenshu_Project\Wenshu\spiders\wenshu.py", line 108, in get_detail content_1 = json.loads(re.search(r'JSON.stringify((.*?));\$(document', html).group(1)) # 内容详情字典1 AttributeError: 'NoneType' object has no attribute 'group' 2019-01-02 16:42:39 [scrapy.core.engine] INFO: Closing spider (finished)

kingshrimp commented 5 years ago

我也是遇到这个问题,我在网上百度了下,是不是这个原因:某些下载线程没有正常执行回调方法引起程序一直以为线程还未下载完成,参考:https://my.oschina.net/airship/blog/628765

xs14331309 commented 5 years ago

我也是遇到这个问题,我在网上百度了下,是不是这个原因:某些下载线程没有正常执行回调方法引起程序一直以为线程还未下载完成,参考:https://my.oschina.net/airship/blog/628765

你有尝试添加然后解决这个问题了吗

kingshrimp commented 5 years ago

我尝试了下,百度上说的那两个文件下不下来,我自己点开里面的链接,照着改了下timeout,发现还是一样的问题,我们加下微信,也许一起交流下能解决,15868194743