Closed ul8ksgdmy closed 5 years ago
크롤링 진행사항 : 18262 / 21998 크롤링 진행사항 : 18263 / 21998 requests.exceptions.ChunkedEncodingError에서 에러 발생 ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer')) 오류 다음 페이지에서 재접속 오류가 일어난 페이지 처리 크롤링 진행사항 : 18264 / 21998 크롤링 진행사항 : 18265 / 21998
error log ... ( 중략) 크롤링 진행사항 : 21726 / 21998 크롤링 진행사항 : 21727 / 21998 Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 601, in _update_chunk_length self.chunk_left = int(line, 16) ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 360, in _error_catcher yield File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 666, in read_chunked self._update_chunk_length() File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 605, in _update_chunk_length raise httplib.IncompleteRead(line) http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/requests/models.py", line 750, in generate for chunk in self.raw.stream(chunk_size, decode_content=True): File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 490, in stream for line in self.read_chunked(amt, decode_content=decode_content): File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 694, in read_chunked self._original_response.close() File "/usr/lib64/python3.6/contextlib.py", line 99, in exit self.gen.throw(type, value, traceback) File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 378, in _error_catcher raise ProtocolError('Connection broken: %r' % e, e) urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "ruri_main.py", line 49, in
cd.insertone(cr.crawling('ilbe', 1000)) #저장할 컬렉션은 ini 파일에서 변경해야 함
File "/home/centos/tmp/ruri_service.py", line 29, in crawling
result = wc.crawlingposts(lastpage, ctargetdata) #크롤링 실행 및 결과를 변수에 담음
File "/home/centos/tmp/ruri_crawler.py", line 245, in crawlingposts
contents_part_list = self.cr_lowerpages(headers, upper_page_list, keykeys, keyvalues)
File "/home/centos/tmp/ruri_crawler.py", line 146, in cr_lowerpages
inner_res = requests.get(innerlink, headers=headers)
File "/usr/lib/python3.6/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, kwargs)
File "/usr/lib/python3.6/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 686, in send
r.content
File "/usr/lib/python3.6/site-packages/requests/models.py", line 828, in content
self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
File "/usr/lib/python3.6/site-packages/requests/models.py", line 753, in generate
raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
해결책으로 아래 링크 참조 (수정중) https://stackoverflow.com/questions/44509423/python-requests-chunkedencodingerrore-requests-iter-lines