dataabc / weiboSpider

新浪微博爬虫,用python爬取新浪微博数据
8.37k stars 1.98k forks source link

运行中途出错,无法继续下载 #349

Closed ZhuningS closed 3 years ago

ZhuningS commented 3 years ago

为了更好的解决问题,请认真回答下面的问题。等到问题解决,请及时关闭本issue。

答:github版 我的python版本3.8.5

答:是

答:是

答:Progress: 6%|█████████████▎ | 27/454 [02:40<40:50, 5.74s/it]HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /5030944946/profile?starttime=20140210&endtime=20201107&advancedfilter=1&page=28 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000000003C2BB50>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')) Traceback (most recent call last): File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connection.py", line 169, in _new_conn conn = connection.create_connection( File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\util\connection.py", line 73, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\socket.py", line 918, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 699, in urlopen httplib_response = self._make_request( File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 382, in _make_request self._validate_conn(conn) File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 1010, in _validate_conn conn.connect() File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connection.py", line 353, in connect conn = self._new_conn() File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connection.py", line 181, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x0000000003C2BB50>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\adapters.py", line 439, in send resp = conn.urlopen( File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\connectionpool.py", line 755, in urlopen retries = retries.increment( File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\urllib3\util\retry.py", line 573, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /5030944946/profile?starttime=20140210&endtime=20201107&advancedfilter=1&page=28 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000000003C2BB50>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\weiboSpider-master\weibo_spider\parser\util.py", line 25, in handle_html resp = requests.get(url, headers=headers) File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\api.py", line 76, in get return request('get', url, params=params, kwargs) File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\api.py", line 61, in request return session.request(method=method, url=url, kwargs) File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\sessions.py", line 542, in request resp = self.send(prep, send_kwargs) File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\sessions.py", line 655, in send r = adapter.send(request, kwargs) File "C:\Users\Intel\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\adapters.py", line 516, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPSConnectionPool(host='weibo.cn', port=443): Max retries exceeded with url: /5030944946/profile?starttime=20140210&endtime=20201107&advancedfilter=1&page=28 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000000003C2BB50>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')) Progress: 6%|█████████████▎ | 27/454 [02:51<45:08, 6.34s/it] 'NoneType' object has no attribute 'xpath' Traceback (most recent call last): File "D:\weiboSpider-master\weibo_spider\spider.py", line 167, in get_weibo_info weibos, self.weibo_id_list, to_continue = PageParser( File "D:\weiboSpider-master\weibo_spider\parser\page_parser.py", line 45, in init info = self.selector.xpath("//div[@class='c']") AttributeError: 'NoneType' object has no attribute 'xpath' 共爬取139条原创微博 信息抓取完毕


dataabc commented 3 years ago

感谢反馈。

应该是暂时被限制了,建议降低速度,参考常见问题问题2。

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

Closing as stale, please reopen if you'd like to work on this further.