dataabc / weibo-crawler

新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频
3.49k stars 771 forks source link

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)请问如何处理? #25

Open 469698742 opened 4 years ago

469698742 commented 4 years ago

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Progress: 0%| | 0/156 [00:00<?, ?it/s]第1页 Progress: 0%| | 0/156 [00:00<?, ?it/s] 微博爬取完成,共爬取0条微博 信息抓取完毕


Error: Expecting value: line 1 column 1 (char 0) Traceback (most recent call last): File "/Users/cc/Downloads/weibo-crawler-master/weibo.py", line 854, in start self.get_pages() File "/Users/cc/Downloads/weibo-crawler-master/weibo.py", line 803, in get_pages self.get_user_info() File "/Users/cc/Downloads/weibo-crawler-master/weibo.py", line 174, in get_user_info js = self.get_json(params) File "/Users/cc/Downloads/weibo-crawler-master/weibo.py", line 113, in get_json return r.json() File "/Users/cc/anaconda3/lib/python3.7/site-packages/requests/models.py", line 897, in json return complexjson.loads(self.text, **kwargs) File "/Users/cc/anaconda3/lib/python3.7/json/init.py", line 348, in loads return _default_decoder.decode(s) File "/Users/cc/anaconda3/lib/python3.7/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/Users/cc/anaconda3/lib/python3.7/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

dataabc commented 4 years ago

感谢反馈,能否提供出错的user_id,方便调试,谢谢

Nathern001 commented 4 years ago

感谢反馈,能否提供出错的user_id,方便调试,谢谢

我也遇到这个问题了,user_id是1877809031

Nathern001 commented 4 years ago

感谢反馈,能否提供出错的user_id,方便调试,谢谢

我也遇到这个问题了,user_id是1877809031

还有这个user_id:2214855827,爬到第7页也开始出现这个问题

Nathern001 commented 4 years ago

感谢反馈,能否提供出错的user_id,方便调试,谢谢

我也遇到这个问题了,user_id是1877809031

还有这个user_id:2214855827,爬到第7页也开始出现这个问题

还有1650713582

dataabc commented 4 years ago

@469698742 @Nathern001 我测试了下,除了最后一个都可以爬下来,感觉是爬的太快被限制了。最后一个显示有几千条微博,主页却没有一条微博,感觉是博主自己作了限制。 解决上面的问题,大概有两种,一种是加cookie,另一种就是减慢爬取速度。如果加了cookie还有问题,应该是被限制了,过一段时间限制会自动解除。 减慢速度需要修改get_pages方法中的如下代码:

                if page - page1 == random_pages and page < page_count:
                    sleep(random.randint(6, 10))
                    page1 = page
                    random_pages = random.randint(1, 5)

代码的意思是程序每爬1到5页,随机暂停6到10秒,这是程序的默认值。因为要减速,可以加快暂停的频率,如改成1到3页,也可以增加每次的暂停时间,如10到15秒。可以根据自己的需求改,爬的越慢,被限制的几率就越小,但是速度就将下来了,需要自己权衡利弊。

469698742 commented 4 years ago

我个人需求是爬用户,因此设置了0天发现有的用户没发微博就爬得太快了,因此我把self.get_pages()注释掉并加了暂停,就没有被限制了

    def start(self):
        """运行爬虫"""
        try:
            for user_id in self.user_id_list:
                self.initialize_info(user_id)
                self.get_user_info()
                #self.get_pages()
                print(u'信息抓取完毕')
                print('*' * 100)
                sleep(random.randint(6, 10))
dataabc commented 4 years ago

@469698742 感谢反馈,很有参考价值。