Aqua-Dream / Tieba_Spider

百度贴吧爬虫(基于scrapy和mysql)
389 stars 116 forks source link

运行过程中报错 ERROR: Spider error processing #19

Closed iseesaw closed 4 years ago

iseesaw commented 4 years ago

16:34 开始运行的

2019-11-08 16:47:09 [scrapy.core.scraper] ERROR: Spider error processing <GET https://tieba.baidu.com/p/totalComment?tid=5256877623&fid=1&pn=1&red_tag=2829333245> (referer: http://tieba.baidu.com/p/5256877623)
Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/root/anaconda3/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 84, in evaluate_iterable
    for r in iterable:
  File "/root/anaconda3/lib/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/root/anaconda3/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 84, in evaluate_iterable
    for r in iterable:
  File "/root/anaconda3/lib/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/root/anaconda3/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 84, in evaluate_iterable
    for r in iterable:
  File "/root/anaconda3/lib/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/root/anaconda3/lib/python3.6/site-packages/scrapy/core/spidermw.py", line 84, in evaluate_iterable
    for r in iterable:
  File "/root/anaconda3/lib/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/root/Tieba_Spider/tieba/spiders/tieba_spider.py", line 82, in parse_comment
    for value in comment_list.values():
AttributeError: 'list' object has no attribute 'values'
Aqua-Dream commented 4 years ago

这个我没能复现成功,comment_list应该是dict,我爬你声明的这个帖子也确实是dict。被解析为list有可能是网络波动造成内容读取错误吧,应该是偶然事件。

如果在你那边可以稳定复现的话,能否帮忙调试看看comment_list具体是什么?就是在你的"/root/Tieba_Spider/tieba/spiders/tieba_spider.py"的第82行(即for value in comment_list.values():)的前面加几行,这一块变成下面这个样子

    def parse_comment(self, response):
        comment_list = json.loads(response.body.decode('utf8'))['data']['comment_list']
        if type(comment_list) == list:
            print("Response Body: ")
            print(response.body)
            print("Comment List: ")
            print(comment_list)
        for value in comment_list.values():
...

这样一来在报错信息的前面就可以看到变量具体内容是什么。

iseesaw commented 4 years ago

Ok,我试试,爬了十来个贴吧,有两三个出现这个问题了

iseesaw commented 4 years ago

没法稳定复现呢