dataabc / weibo-search

获取微博搜索结果信息,搜索即可以是微博关键词搜索,也可以是微博话题搜索
1.71k stars 374 forks source link

直接在开始时间处,程序就无法运行 #372

Open tianshanzhilong opened 1 year ago

tianshanzhilong commented 1 year ago

image 不知道是不是python版本的问题,目前用的是3.11版本

dataabc commented 1 year ago

上面的是完整报错信息吗?

tianshanzhilong commented 1 year ago

截图上传不了,直接放报错代码 D:\爬虫\微博\weibo-search\weibo\spiders>scrapy crawl search 2023-05-12 07:45:54 [scrapy.core.scraper] ERROR: Spider error processing <GET https://s.weibo.com/weibo?q=%E5%91%A8%E6%9D%B0%E4%BC%A6%20%E6%BC%94%E5%94%B1&typeall=1&suball=1&timescope=custom:2023-04-28-0:2023-05-11-0> (referer: None) Traceback (most recent call last): File "E:\Program Files\python11\Lib\site-packages\scrapy\utils\defer.py", line 257, in iter_errback yield next(it) ^^^^^^^^ File "E:\Program Files\python11\Lib\site-packages\scrapy\utils\python.py", line 312, in next return next(self.data) ^^^^^^^^^^^^^^^ File "E:\Program Files\python11\Lib\site-packages\scrapy\utils\python.py", line 312, in next return next(self.data) ^^^^^^^^^^^^^^^ File "E:\Program Files\python11\Lib\site-packages\scrapy\core\spidermw.py", line 104, in process_sync for r in iterable: File "E:\Program Files\python11\Lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 28, in return (r for r in result or () if self._filter(r, spider)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Program Files\python11\Lib\site-packages\scrapy\core\spidermw.py", line 104, in process_sync for r in iterable: File "E:\Program Files\python11\Lib\site-packages\scrapy\spidermiddlewares\referer.py", line 353, in return (self._set_referer(r, response) for r in result or ()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Program Files\python11\Lib\site-packages\scrapy\core\spidermw.py", line 104, in process_sync for r in iterable: File "E:\Program Files\python11\Lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 27, in return (r for r in result or () if self._filter(r, spider)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Program Files\python11\Lib\site-packages\scrapy\core\spidermw.py", line 104, in process_sync for r in iterable: File "E:\Program Files\python11\Lib\site-packages\scrapy\spidermiddlewares\depth.py", line 31, in return (r for r in result or () if self._filter(r, response, spider)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Program Files\python11\Lib\site-packages\scrapy\core\spidermw.py", line 104, in process_sync for r in iterable: File "D:\爬虫\微博\weibo-search\weibo\spiders\search.py", line 106, in parse for weibo in self.parse_weibo(response): File "D:\爬虫\微博\weibo-search\weibo\spiders\search.py", line 419, in parse_weibo comments_count = re.findall(r'\d+.*', comments_count) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\Program Files\python11\Lib\re__init__.py", line 216, in findall return _compile(pattern, flags).findall(string) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: expected string or bytes-like object, got 'NoneType'

dataabc commented 1 year ago

这个貌似有的用户会出错,有的没错。您可以使用较老的版本看看是否能运行。

tianshanzhilong commented 1 year ago

好的,谢谢

tianshanzhilong commented 1 year ago

找到原因了,代码解析有问题导致的报错,把这2个字段禁用后,就可以抓取数据了 image

nia717 commented 8 months ago

尝试直接禁用后发现不仅抓不到评论数和点赞数,并且之后的字段都会错位,往前顺移了2列,字段名和值对应不上。 于是把红色代码改成了绿色块,现在可以正常抓取了 image