Aqua-Dream / Tieba_Spider

百度贴吧爬虫(基于scrapy和mysql)
397 stars 116 forks source link

运行Tieba_Spider提示如下错误,一直解决不了,请帮忙看看原因,谢谢! #2

Closed kitty7angela closed 7 years ago

kitty7angela commented 7 years ago

2017-07-21 21:37:53 [scrapy.core.scraper] ERROR: Spider error processing <GET https://tieba.baidu.com/f?kw=%E4%BB%99%E5%89%915&pn=0> (referer: None) Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line 102, in iter_errback yield next(it) File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output for x in result: File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/referer.py", line 339, in return (_set_referer(r) for r in result or ()) File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/urllength.py", line 37, in return (r for r in result or () if _filter(r)) File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/depth.py", line 58, in return (r for r in result or () if _filter(r)) File "/root/Tieba_Spider/tieba/spiders/tieba_spider.py", line 41, in parse yield self.make_requests_from_url(next_page.extract_first()) File "/usr/local/lib/python2.7/dist-packages/scrapy/spiders/init.py", line 87, in make_requests_from_url return Request(url, dont_filter=True) File "/usr/local/lib/python2.7/dist-packages/scrapy/http/request/init.py", line 25, in init self._set_url(url) File "/usr/local/lib/python2.7/dist-packages/scrapy/http/request/init.py", line 58, in _set_url raise ValueError('Missing scheme in request url: %s' % self._url) ValueError: Missing scheme in request url: //tieba.baidu.com/f?kw=%E4%BB%99%E5%89%915&ie=utf-8&pn=50

kitty7angela commented 7 years ago

搞清楚了,问题在这里,一直没注意。 //tieba.baidu.com/f?kw=%E4%BB%99%E5%89%915&ie=utf-8&pn=50

tieba_spider.py第41行改为: yield self.make_requests_from_url('http:'+next_page.extract_first()) 就可以了

Aqua-Dream commented 7 years ago

多谢指出。贴吧把下一页的链接形式改了,scrapy似乎无法处理省略协议的链接,所以只能手动加了。