dataabc / weiboSpider

新浪微博爬虫,用python爬取新浪微博数据
8.37k stars 1.98k forks source link

运行报错 #308

Closed gudaocode closed 3 years ago

gudaocode commented 3 years ago

为了更好的解决问题,请认真回答下面的问题。等到问题解决,请及时关闭本issue。

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\adapters.py", line 449, in send timeout=timeout File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 727, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\util\retry.py", line 446, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /3941045903?page=12 (Caused by SSLError(SSLError(1, '[SSL: BAD_SIGNATURE] bad signature (_ssl.c:1091)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "\weibo_spider\parser\util.py", line 24, in handle_html resp = requests.get(url, headers=headers) File "C:\Users\\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\api.py", line 76, in get return request('get', url, params=params, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\api.py", line 61, in request return session.request(method=method, url=url, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 530, in request resp = self.send(prep, send_kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 643, in send r = adapter.send(request, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\adapters.py", line 514, in send raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /3941045903?page=12 (Caused by SSLError(SSLError(1, '[SSL: BAD_SIGNATURE] bad signature (_ssl.c:1091)'))) 'NoneType' object has no attribute 'xpath' Traceback (most recent call last): File "\weibo_spider\parser\page_parser.py", line 44, in get_one_page info = self.selector.xpath("//div[@class='c']") AttributeError: 'NoneType' object has no attribute 'xpath' Progress: 6%|███▉ | 11/198 [01:59<33:46, 10.84s/it] cannot unpack non-iterable NoneType object Traceback (most recent call last): File "\weibo_spider\spider.py", line 170, in get_weibo_info self.weibo_id_list) # 获取第page页的全部微博 TypeError: cannot unpack non-iterable NoneType object

dataabc commented 3 years ago

感谢反馈。

修改parser文件夹下的util.py文件的handle_html方法,将resp = requests.get(url, headers=headers)修改为 resp = requests.get(url, headers=headers, verify=False),看看是否有效。

gudaocode commented 3 years ago

C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning, C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning, 用户昵称: 孙艺珍吧官博 用户id: 3941045903 微博数: 1980 关注数: 56 粉丝数: 57622



C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning, Progress: 0%| | 0/198 [00:00<?, ?it/s]C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning, C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning, ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) Traceback (most recent call last): File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 677, in urlopen chunked=chunked, File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 426, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 421, in _make_request httplib_response = conn.getresponse() File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\http\client.py", line 1369, in getresponse response.begin() File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\http\client.py", line 310, in begin version, status, reason = self._read_status() File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\http\client.py", line 279, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\adapters.py", line 449, in send timeout=timeout File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 727, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\util\retry.py", line 410, in increment raise six.reraise(type(error), error, _stacktrace) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\packages\six.py", line 734, in reraise raise value.with_traceback(tb) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 677, in urlopen chunked=chunked, File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 426, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 421, in _make_request httplib_response = conn.getresponse() File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\http\client.py", line 1369, in getresponse response.begin() File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\http\client.py", line 310, in begin version, status, reason = self._read_status() File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\http\client.py", line 279, in _read_status raise RemoteDisconnected("Remote end closed connection without" urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "\weibo_spider\parser\util.py", line 24, in handle_html resp = requests.get(url, headers=headers, verify=False) File "C:\Users\\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\api.py", line 76, in get return request('get', url, params=params, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\api.py", line 61, in request return session.request(method=method, url=url, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 530, in request resp = self.send(prep, send_kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 643, in send r = adapter.send(request, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\adapters.py", line 498, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning, [置顶]#孙艺珍[超话]#【招新公告】百度孙艺珍吧翻译制作组招新公告各位来自五湖四海的朋友大家好,很高兴和大家相聚在这一方属于孙艺珍中饭的自留地。因现任翻译组成员工作较多,短时间内无法参与文字及视频的制作,故特此向大家发出新一期的招新公示。招新岗位:①韩语笔译。文字表达能力强,可自主翻...全文 原图  微博发布位置:无 发布时间:2020-03-07 20:15 发布工具:孙艺珍超话 点赞数:546 转发数:67 评论数:211 url:https://weibo.cn/comment/Ixxuj2Ivg


孙艺珍[超话]#【CF视频】20210302 油管官号MSteam Official更新花絮视频。[孙艺珍] Behind Clip : Aphrodite再世!!2021年3月1 日发布女神再世的孙艺珍广告拍摄现场花絮http://t.cn/A6t9KHS3 孙艺珍吧官博的微博视频  

微博发布位置:无 发布时间:2021-03-02 15:44 发布工具:孙艺珍超话 点赞数:132 转发数:15 评论数:10 url:https://weibo.cn/comment/K4kEzE5HD


C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning, C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning,

孙艺珍[超话]#【Instagram更新】20210301 yejinhand更新这么晚了 大家都在睡觉吗?😅下雪了..😅🙊下了一整天的雨,却变成了雪...明天重新开始周一一样的周二加油!💕我被邀请参加Valentino Act Collection的FW 21.22时装秀了^^+valentinocollezionemilano +Romanstud +ad翻译:柯柯http://t.cn/A6t9f8k6   原图  

微博发布位置:无 发布时间:2021-03-02 12:11 发布工具:孙艺珍超话 点赞数:165 转发数:9 评论数:17 url:https://weibo.cn/comment/K4jfQDhDj


C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning,

孙艺珍[超话]#【Instagram更新】20210228 yejinhand更新3月1日 韩国时间晚10点一起欣赏华伦天奴的时装秀吧!Looking forward to the @maisonvalentino show on 1st March 💕+valentinoActCollection +adhttp://t.cn/A6tSgdKp  [组图共2张] 原图 

微博发布位置:无 发布时间:2021-02-28 15:54 发布工具:孙艺珍超话 点赞数:155 转发数:15 评论数:26 url:https://weibo.cn/comment/K41Rj0H98


孙艺珍[超话]#【Instagram更新】20210227 yejinhand二更,认证孔姐礼物。你从哪儿弄来的这么可爱的像自己的东西?!Thank you🥰

话说回来,今天天气太好了...🤦♀️祝大家度过幸福的星期六💕http://t.cn/A6ta3vGF  原图  微博发布位置:无 发布时间:2021-02-27 16:39 发布工具:孙艺珍超话 点赞数:251 转发数:20 评论数:32 url:https://weibo.cn/comment/K3SJa64oO


孙艺珍[超话]#【Instagram更新】20210227 yejinhand更新👚👗🧥💕+Crocodilelady +Hyungji时装加油🙋♀️附上镜面自拍中标签字样为:Crocodilelady为美丽孙艺珍演员的2021年加油助威。http://t.cn/A6tauuWF  原图 

微博发布位置:无 发布时间:2021-02-27 16:35 发布工具:孙艺珍超话 点赞数:287 转发数:14 评论数:34 url:https://weibo.cn/comment/K3SHqkemt


C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning, C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning, 【中字视频】20210210 油管官号종근당건강(钟根堂健康)更新CF视频——Lacto Biome TVC品牌篇30"2021年2月10日发布什么能改变我们的健康?肠内微生物。如果改变肠内微生物的生态系统,健康就会发生变化。Lacto Biome,改变肠内微生物,从根本上整顿健康。开始 吧,用Lacto Biome。以肠内微生物技术为基础的高端专业生物技术,肠内微生物护理的开始,Lacto Biome。翻译:柯柯制作:孙艺珍吧http://t.cn/A6tJ8ae7 孙艺珍吧官博的微博视频   微博发布位置:无 发布时间:2021-02-24 01:34 发布工具:孙艺珍超话 点赞数:186 转发数:10 评论数:14 url:https://weibo.cn/comment/K3kwujy9b


C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning, C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning, 【CF视频】20210214-20210222 Twitter、Instagram官号 LiveSmart、油管官号Smart Communications更新CF视频及花絮,ABS-CBN官网 更新给粉丝的祝福。Believe that anything is possible with Smart5G at the palm of your hands. The future is inevitable, it's time to break barriers. More on the link in our bio.Share what makes you happy with "Free Stories for All". Enjoy with 14GB for FB, IG, Twitter, Tiktok and more! 相信有了Smart5G在手,一切皆有可能。未来是不可避免的,是时候打破障碍了。 更 多链接在livesmart官网。与"Free Stories for All"分享你快乐的当下。 享受14GB的FB,IG,Twitter,Tiktok等资源!http://t.cn/A6tbKAi7http://t.cn/A6tMucD7http://t.cn/A6tJTqRKhttp://t.cn/A6tMFEbi 孙艺珍吧官博的微博视频   微博发布位置:无 发布时间:2021-02-24 01:33 发布工具:孙艺珍超话 点赞数:348 转发数:24 评论数:44 url:https://weibo.cn/comment/K3kwczEUg


【中字视频】20210219 油管官号MSteam Office 更新“MSteam演员们的2021愿望视讯”2021年2月19日发布我们听完了MSteam演员们今年的愿望&目标!希望2021年,所有愿望都能实现。孙艺珍cut中字制作:孙艺珍吧http://t.cn/A6tVlz0g 孙艺珍吧官博的微博视频   微博发布位置:无 发布时间:2021-02-21 17:03 发布工具:孙艺珍超话 点赞数:194 转发数:19 评论数:22 url:https://weibo.cn/comment/K2YjVkhQP


孙艺珍[超话]#【Instagram更新】20210220 yejinhand更新🥰各位~~度过幸福的周末吧..💕http://t.cn/A6tf8vP6  原图 

微博发布位置:无 发布时间:2021-02-20 19:57 发布工具:孙艺珍超话 点赞数:563 转发数:44 评论数:71 url:https://weibo.cn/comment/K2Q2ezDdE


C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning, 【CF美图】20210218 Instagram 官号 crocodileladies_kr 更新视频同时还更新了2张美图(p1、2)。另外20210205官宣孙艺珍为新代 言人时也更新了3张美图(p3-5)。http://t.cn/A6tV9ngs  [组图共5张] 原图  微博发布位置:无 发布时间:2021-02-19 19:29 发布工具:孙艺珍超话 点赞数:189 转发数:11 评论数:11 url:https://weibo.cn/comment/K2Gqtdkb9


------------------------------已获取孙艺珍吧官博(3941045903)的第1页微博------------------------------ 11条微博写入csv文件完毕,保存路径:D:\Desktop\Downloads\RW_New{孙艺珍}\Weibo\3941045903\3941045903.csv 即将进行图片下载 Download progress: 100%|████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 11035.00it/s] | 0/11 [00:00<?, ?it/s] 图片下载完毕,保存路径: D:\Desktop\Downloads\RW_New{孙艺珍}\Weibo\3941045903\img Progress: 1%|▎ | 1/198 [00:28<1:32:55, 28.30s/it]C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py:988: InsecureRequestWarning: Unverified HTTPS request is being made to host 'weibo.cn'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings InsecureRequestWarning,

dataabc commented 3 years ago

这应该只是警告不是错误,在py文件中加上

import warnings

warnings.filterwarnings('ignore')

应该就没警告了。

gudaocode commented 3 years ago

感谢!我再试试!

gudaocode commented 3 years ago

最后出现了这个提示,看末尾写的是“共爬取699条微博。信息抓取完毕”,但是id list中没有更新 同时去该id的页面看了一下,说是1980条微博 https://weibo.com/u/3941045903?is_all=1

Progress: 35%|█████████████████████████ | 70/198 [12:20<32:30, 15.24s/it]HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /3941045903?page=71 (Caused by SSLError(SSLError(1, '[SSL: BAD_SIGNATURE] bad signature (_ssl.c:1091)'))) Traceback (most recent call last): File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 677, in urlopen chunked=chunked, File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 381, in _make_request self._validate_conn(conn) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 978, in _validate_conn conn.connect() File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connection.py", line 371, in connect sslcontext=context, File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\util\ssl.py", line 386, in ssl_wrap_socket return context.wrap_socket(sock, server_hostname=server_hostname) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\ssl.py", line 423, in wrap_socket session=session File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\ssl.py", line 870, in _create self.do_handshake() File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\ssl.py", line 1139, in do_handshake self._sslobj.do_handshake() ssl.SSLError: [SSL: BAD_SIGNATURE] bad signature (_ssl.c:1091)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\adapters.py", line 449, in send timeout=timeout File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\connectionpool.py", line 727, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\urllib3\util\retry.py", line 446, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /3941045903?page=71 (Caused by SSLError(SSLError(1, '[SSL: BAD_SIGNATURE] bad signature (_ssl.c:1091)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "\weibo_spider\parser\util.py", line 27, in handle_html resp = requests.get(url, headers=headers, verify=False) File "C:\Users\\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\api.py", line 76, in get return request('get', url, params=params, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\api.py", line 61, in request return session.request(method=method, url=url, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 530, in request resp = self.send(prep, send_kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 643, in send r = adapter.send(request, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\adapters.py", line 514, in send raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /3941045903?page=71 (Caused by SSLError(SSLError(1, '[SSL: BAD_SIGNATURE] bad signature (_ssl.c:1091)'))) 'NoneType' object has no attribute 'xpath' Traceback (most recent call last): File "\weibo_spider\parser\page_parser.py", line 44, in get_one_page info = self.selector.xpath("//div[@class='c']") AttributeError: 'NoneType' object has no attribute 'xpath' Progress: 35%|█████████████████████████ | 70/198 [12:23<22:39, 10.62s/it] cannot unpack non-iterable NoneType object Traceback (most recent call last): File "\weibo_spider\spider.py", line 170, in get_weibo_info self.weibo_id_list) # 获取第page页的全部微博 TypeError: cannot unpack non-iterable NoneType object 共爬取699条微博 信息抓取完毕

dataabc commented 3 years ago

参考https://blog.csdn.net/qq_31077649/article/details/79013199看看

gudaocode commented 3 years ago

看了那几篇文章,目前的操作是: 安装了三个库,没有添加这些内容: resp = requests.get(url, headers=headers, verify=False) import warnings warnings.filterwarnings('ignore')

运行,出来了这个错误:

Progress: 33%|███████████████████████ | 41/126 [06:42<10:24, 7.35s/it]HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /1669879400?page=42 (Caused by SSLError(SSLError("bad handshake: Error([('rsa routines', 'int_rsa_verify', 'wrong signature length'), ('SSL routines', 'tls_process_key_exchange', 'bad signature')])"))) Traceback (most recent call last): File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\contrib\pyopenssl.py", line 488, in wrap_socket cnx.do_handshake() File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\OpenSSL\SSL.py", line 1828, in do_handshake self._raise_ssl_error(self._ssl, result) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\OpenSSL\SSL.py", line 1566, in _raise_ssl_error _raise_current_error() File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\OpenSSL_util.py", line 57, in exception_from_error_queue raise exception_type(errors) OpenSSL.SSL.Error: [('rsa routines', 'int_rsa_verify', 'wrong signature length'), ('SSL routines', 'tls_process_key_exchange', 'bad signature')]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 670, in urlopen httplib_response = self._make_request( File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 381, in _make_request self._validate_conn(conn) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 978, in _validate_conn conn.connect() File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connection.py", line 362, in connect self.sock = ssl_wrapsocket( File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\util\ssl.py", line 386, in ssl_wrap_socket return context.wrap_socket(sock, server_hostname=server_hostname) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\contrib\pyopenssl.py", line 494, in wrap_socket raise ssl.SSLError("bad handshake: %r" % e) ssl.SSLError: ("bad handshake: Error([('rsa routines', 'int_rsa_verify', 'wrong signature length'), ('SSL routines', 'tls_process_key_exchange', 'bad signature')])",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 439, in send resp = conn.urlopen( File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 726, in urlopen retries = retries.increment( File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\util\retry.py", line 446, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /1669879400?page=42 (Caused by SSLError(SSLError("bad handshake: Error([('rsa routines', 'int_rsa_verify', 'wrong signature length'), ('SSL routines', 'tls_process_key_exchange', 'bad signature')])")))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "\weibo_spider\parser\util.py", line 27, in handle_html resp = requests.get(url, headers=headers) #, verify=False) File "C:\Users\\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 76, in get return request('get', url, params=params, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 61, in request return session.request(method=method, url=url, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 530, in request resp = self.send(prep, send_kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 643, in send r = adapter.send(request, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 514, in send raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /1669879400?page=42 (Caused by SSLError(SSLError("bad handshake: Error([('rsa routines', 'int_rsa_verify', 'wrong signature length'), ('SSL routines', 'tls_process_key_exchange', 'bad signature')])"))) 'NoneType' object has no attribute 'xpath' Traceback (most recent call last): File "\weibo_spider\parser\page_parser.py", line 44, in get_one_page info = self.selector.xpath("//div[@class='c']") AttributeError: 'NoneType' object has no attribute 'xpath' Progress: 33%|███████████████████████ | 41/126 [06:45<14:01, 9.89s/it] cannot unpack non-iterable NoneType object Traceback (most recent call last): File "\weibo_spider\spider.py", line 167, in get_weibo_info weibos, self.weibo_id_list, to_continue = PageParser( TypeError: cannot unpack non-iterable NoneType object 共爬取410条微博 信息抓取完毕

之后该id停止继续爬取

gudaocode commented 3 years ago

又进行了一次测试,这次是安装了三个库的前提下,做了修改 resp = requests.get(url, headers=headers, verify=False) import warnings warnings.filterwarnings('ignore') 出现的报错是: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /6269329742 (Caused by SSLError(SSLError("bad handshake: Error([('rsa routines', 'int_rsa_verify', 'wrong signature length'), ('SSL routines', 'tls_process_key_exchange', 'bad signature')])"))) Traceback (most recent call last): File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\contrib\pyopenssl.py", line 488, in wrap_socket cnx.do_handshake() File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\OpenSSL\SSL.py", line 1828, in do_handshake self._raise_ssl_error(self._ssl, result) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\OpenSSL\SSL.py", line 1566, in _raise_ssl_error _raise_current_error() File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\OpenSSL_util.py", line 57, in exception_from_error_queue raise exception_type(errors) OpenSSL.SSL.Error: [('rsa routines', 'int_rsa_verify', 'wrong signature length'), ('SSL routines', 'tls_process_key_exchange', 'bad signature')]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 670, in urlopen httplib_response = self._make_request( File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 381, in _make_request self._validate_conn(conn) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 978, in _validate_conn conn.connect() File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connection.py", line 362, in connect self.sock = ssl_wrapsocket( File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\util\ssl.py", line 386, in ssl_wrap_socket return context.wrap_socket(sock, server_hostname=server_hostname) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\contrib\pyopenssl.py", line 494, in wrap_socket raise ssl.SSLError("bad handshake: %r" % e) ssl.SSLError: ("bad handshake: Error([('rsa routines', 'int_rsa_verify', 'wrong signature length'), ('SSL routines', 'tls_process_key_exchange', 'bad signature')])",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 439, in send resp = conn.urlopen( File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\connectionpool.py", line 726, in urlopen retries = retries.increment( File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\urllib3\util\retry.py", line 446, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /6269329742 (Caused by SSLError(SSLError("bad handshake: Error([('rsa routines', 'int_rsa_verify', 'wrong signature length'), ('SSL routines', 'tls_process_key_exchange', 'bad signature')])")))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "\weibo_spider\parser\util.py", line 27, in handle_html resp = requests.get(url, headers=headers, verify=False) File "C:\Users\\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 76, in get return request('get', url, params=params, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\api.py", line 61, in request return session.request(method=method, url=url, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 530, in request resp = self.send(prep, send_kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 643, in send r = adapter.send(request, kwargs) File "C:\Users*\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\adapters.py", line 514, in send raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /6269329742 (Caused by SSLError(SSLError("bad handshake: Error([('rsa routines', 'int_rsa_verify', 'wrong signature length'), ('SSL routines', 'tls_process_key_exchange', 'bad signature')])"))) 'NoneType' object has no attribute 'xpath' Traceback (most recent call last): File "\weibo_spider\parser\index_parser.py", line 49, in get_page_num if self.selector.xpath("//input[@name='mp']") == []: AttributeError: 'NoneType' object has no attribute 'xpath' unsupported operand type(s) for +: 'int' and 'NoneType' Traceback (most recent call last): File "\weibo_spider\spider.py", line 154, in get_weibo_info if self.page_count > 2 and (self.page_count + TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

gudaocode commented 3 years ago

这两种情况,我分别在两台电脑、两个网络环境下(且其中一个是公司的环境,应该不会出现间歇性断网的情况),均作了测试 感觉两个错误好像是一样的吧? 尤其安装了那三个库,不需要import什么,或者写什么代码或修改吗? 或者该怎么弄?

gudaocode commented 3 years ago

这两种情况,我分别在两台电脑、两个网络环境下(且其中一个是公司的环境,应该不会出现间歇性断网的情况),均作了测试 感觉两个错误好像是一样的吧? 尤其安装了那三个库,不需要import什么,或者写什么代码或修改吗? 或者该怎么弄?

dataabc commented 3 years ago

是你同一个账号测的吗?我在自己的电脑没加verify=False,运行没有问题。这个错误应该是ssl的问题,加上应该可以。难道也和账号有关系?这个我也不确定。如果方便,你可以换个账号在别的电脑上测测,建议先不要下载文件。

gudaocode commented 3 years ago

我也是最近这些天才遇到这个问题的,而且只在爬取同一个id下所有内容时才出现,反倒是那种n多个id连续爬取,但每个只爬取比如最近半个月的,没有这个情况。所以,我都在想,会不会是weibo增加了什么限制措施,被ban了。 也不能这么说,准确的说,是无论什么时候,都会出现此类错误提示,但是爬取多个id,但每个id爬取时间不长的时候,不会因为此错误而导致爬虫停止运行的情况。

gudaocode commented 3 years ago

是你同一个账号测的吗?我在自己的电脑没加verify=False,运行没有问题。这个错误应该是ssl的问题,加上应该可以。难道也和账号有关系?这个我也不确定。如果方便,你可以换个账号在别的电脑上测测,建议先不要下载文件。

正在尝试中

gudaocode commented 3 years ago

我去,这是什么情况!! 我更新了一下cookie(没有第二个weibo账号,本来还想再注册,发现必须用手机,没第二部手机),开始时没问题,居然弄到一半时停止,而且末尾提示,cookie过期!难道真的是weibo对账号爬取做了什么新的限制???

==========报错内容

10条微博写入csv文件完毕,保存路径:D:\Desktop\Downloads***** 即将进行图片下载

Download progress: 30%|███████████████████▏ | 3/10 [00:0 Download progress: 50%|████████████████████████████████ Download progress: 100%|███████████████████████████████████████████████████████████████| 10/10 [00:17<00:00, 1.74s/it] 图片下载完毕,保存路径: D:\Desktop\Downloads*\img Progress: 8%|█████▋ | 15/183 [06:00<1:07:23, 24.07s/it]list index out of range Traceback (most recent call last): File "**\weibo_spider\parser\page_parser.py", line 45, in get_one_page is_exist = info[0].xpath("div/span[@class='ctt']") IndexError: list index out of range Progress: 8%|█████▋ | 15/183 [06:00<1:07:18, 24.04s/it] cannot unpack non-iterable NoneType object Traceback (most recent call last): File "***\weibo_spider\spider.py", line 167, in get_weibo_info weibos, self.weibo_id_list, to_continue = PageParser( TypeError: cannot unpack non-iterable NoneType object 共爬取150条微博 信息抓取完毕


cookie错误或已过期,请按照README中方法重新获取

dataabc commented 3 years ago

额,可能是和账号有关系。过一段时间再试,建议使用自己不常用的账号。

gudaocode commented 3 years ago

额,可能是和账号有关系。过一段时间再试,建议使用自己不常用的账号。

是啊,忽然发现这个很是危险,要是被封号还真是麻烦 最后问一下,综合前面的情况,该问题的解决方式(除了换账号外),相当于我只需要安装那三个库即可?无需修改这三个东西? resp = requests.get(url, headers=headers, verify=False) import warnings warnings.filterwarnings('ignore')

dataabc commented 3 years ago

应该使用第一条语句,它就是修复ssl的,二三条只是去除警告的,写不写无所谓。那三个库不一定要安装,只是网上有说安装了可以解决这个问题的,但是我没用过,不确定。

gudaocode commented 3 years ago

应该使用第一条语句,它就是修复ssl的,二三条只是去除警告的,写不写无所谓。那三个库不一定要安装,只是网上有说安装了可以解决这个问题的,但是我没用过,不确定。

我看网上说的意思是,装了这三个库,就不用改了。暂时先按照这个来吧,要不每次更新爬虫,还得手动改。多谢!

另外,重新注册了个账号,至少暂时在添加了verify=False的前提下,一次错误都没有出现。看来这个问题的确和weibo的设置有关系,可能是加了什么新的规则吧

dataabc commented 3 years ago

感谢反馈,对其他使用者来说,很有参考价值。

gudaocode commented 3 years ago

握手