dataabc / weiboSpider

新浪微博爬虫,用python爬取新浪微博数据
8.37k stars 1.98k forks source link

运行报错 #188

Closed gudaocode closed 4 years ago

gudaocode commented 4 years ago

为了更好的解决问题,请认真回答下面的问题。等到问题解决,请及时关闭本issue。

-----------------------------已获取张萌(1222062284)的第1页微博------------------------------ Progress: 0%| | 0/350 [00:00<?, ?it/s] 共爬取0条微博 信息抓取完毕


Error: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /1229385395/info (Caused by SSLError(SSLError(1, '[SSL: BAD_SIGNATURE] bad signature (_ssl.c:1108)'))) Traceback (most recent call last): File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 670, in urlopen httplib_response = self._make_request( File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 381, in _make_request self._validate_conn(conn) File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 976, in _validate_conn conn.connect() File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connection.py", line 361, in connect self.sock = ssl_wrapsocket( File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\util\ssl.py", line 377, in ssl_wrap_socket return context.wrap_socket(sock, server_hostname=server_hostname) File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1040, in _create self.do_handshake() File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() ssl.SSLError: [SSL: BAD_SIGNATURE] bad signature (_ssl.c:1108)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\adapters.py", line 439, in send resp = conn.urlopen( File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 724, in urlopen retries = retries.increment( File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\util\retry.py", line 439, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /1229385395/info (Caused by SSLError(SSLError(1, '[SSL: BAD_SIGNATURE] bad signature (_ssl.c:1108)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "X:\Program Files\Pic_Follow_Download_WeiBo\weibo_spider\parser\util.py", line 21, in handle_html resp = requests.get(url, cookies=cookie) File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\api.py", line 76, in get return request('get', url, params=params, kwargs) File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\api.py", line 61, in request return session.request(method=method, url=url, kwargs) File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\sessions.py", line 530, in request resp = self.send(prep, send_kwargs) File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\sessions.py", line 643, in send r = adapter.send(request, kwargs) File "C:\Users\jinqi\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\adapters.py", line 514, in send raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /1229385395/info (Caused by SSLError(SSLError(1, '[SSL: BAD_SIGNATURE] bad signature (_ssl.c:1108)'))) Error: 'NoneType' object has no attribute 'xpath' Traceback (most recent call last): File "X:\Program Files\Pic_Follow_Download_WeiBo\weibo_spider\parser\info_parser.py", line 20, in extract_user_info nickname = self.selector.xpath("//title/text()")[0] AttributeError: 'NoneType' object has no attribute 'xpath' Error: 'NoneType' object has no attribute 'id' Traceback (most recent call last): File "X:\Program Files\Pic_Follow_Download_WeiBo\weibo_spider\parser\index_parser.py", line 34, in get_user self.user.id = user_id AttributeError: 'NoneType' object has no attribute 'id' None


Error: 'NoneType' object has no attribute 'nickname' Traceback (most recent call last): File "X:\Program Files\Pic_Follow_Download_WeiBo\weibo_spider\spider.py", line 148, in _get_filepath file_dir = FLAGS.output_dir + os.sep + self.user.nickname AttributeError: 'NoneType' object has no attribute 'nickname' Error: expected str, bytes or os.PathLike object, not NoneType Traceback (most recent call last): File "X:\Program Files\Pic_Follow_Download_WeiBo\weibo_spider\writer\csv_writer.py", line 23, in init with open(self.file_path, "a", encoding="utf-8-sig", TypeError: expected str, bytes or os.PathLike object, not NoneType Error: 'NoneType' object has no attribute 'nickname' Traceback (most recent call last): File "X:\Program Files\Pic_Follow_Download_WeiBo\weibo_spider\spider.py", line 148, in _get_filepath file_dir = FLAGS.output_dir + os.sep + self.user.nickname AttributeError: 'NoneType' object has no attribute 'nickname'


Progress: 0%| | 0/191 [00:00<?, ?it/s] Error: 'NoneType' object has no attribute 'nickname' Traceback (most recent call last): File "X:\Program Files\Pic_Follow_Download_WeiBo\weibo_spider\spider.py", line 123, in get_weibo_info self.user.nickname, AttributeError: 'NoneType' object has no attribute 'nickname' 共爬取0条微博 信息抓取完毕


Error: 'NoneType' object has no attribute 'nickname' Traceback (most recent call last): File "X:\Program Files\Pic_Follow_Download_WeiBo\weibo_spider\spider.py", line 235, in start self.user.nickname, AttributeError: 'NoneType' object has no attribute 'nickname'

dataabc commented 4 years ago

感谢反馈。

可能是网络原因或者速度太快,后者可能性大一些。微博会限制爬取的速度,过快就会出错。没有特别好的方法,尽量放慢速度就行

gudaocode commented 4 years ago

全都是娱乐明星的账号,而且,间隔时长已经调整为15--20,sleep频率是1--2页了 另外,如果是因为频率过高而被屏蔽了,那更换cookie一类的会有帮助吗?还是等多久以后再重新尝试?

dataabc commented 4 years ago

有使用者反应过,换账号可以,如果用同一个账号,我也不知道具体多久

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

Closing as stale, please reopen if you'd like to work on this further.