dataabc / weibo-crawler

新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频
3.35k stars 748 forks source link

报错,但能执行。 #293

Open zhaibin opened 2 years ago

zhaibin commented 2 years ago

Extra data: line 128 column 2 (char 4697) Traceback (most recent call last): File "weibo.py", line 766, in get_one_weibo weibo = self.get_long_weibo(weibo_id) File "weibo.py", line 351, in get_long_weibo js = json.loads(html, strict=False) File "/usr/lib/python3.7/json/init.py", line 361, in loads return cls(**kw).decode(s) File "/usr/lib/python3.7/json/decoder.py", line 340, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 128 column 2 (char 4697)

tuling-xiaofeng commented 2 years ago

Extra data: line 128 column 2 (char 4697) Traceback (most recent call last): File "weibo.py", line 766, in get_one_weibo weibo = self.get_long_weibo(weibo_id) File "weibo.py", line 351, in get_long_weibo js = json.loads(html, strict=False) File "/usr/lib/python3.7/json/init.py", line 361, in loads return cls(**kw).decode(s) File "/usr/lib/python3.7/json/decoder.py", line 340, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 128 column 2 (char 4697)

一样的问题,昨天之前都能正常运行,大概晚上就出错了还以为是我改动了哪里

ffffuturexu commented 2 years ago

这个bug的原因是请求到的html不能被parse成单个json object,而json.loads()只能处理单个json object,导致的结果是无法抓取长微博。估计是微博页面的html结构变了。

出错位置在这里: https://github.com/dataabc/weibo-crawler/blob/0fbc03d80f84d3728993d3693c06462d4bf85d8a/weibo.py#L349-L351 修改为: html = html[:html.rfind(',')] html = html[:html.rfind('][')] (增加) html = '{' + html (修改) js = json.loads(html, strict=False)

tuling-xiaofeng commented 2 years ago

这个bug的原因是请求到的html不能被parse成单个json object,而json.loads()只能处理单个json object,导致的结果是无法抓取长微博。估计是微博页面的html结构变了。

出错位置在这里:

https://github.com/dataabc/weibo-crawler/blob/0fbc03d80f84d3728993d3693c06462d4bf85d8a/weibo.py#L349-L351

修改为: html = html[:html.rfind(',')] html = html[:html.rfind('][')] (增加) html = '{' + html (修改) js = json.loads(html, strict=False)

感谢大佬,问题已解决

mobyw commented 2 years ago

长微博 HTML 结构有变,get_long_weibo 方法中的 "hotScheme" 改为 "call" 即可