Makeshiftshelter01 / Mater

0 stars 1 forks source link

크롤링시 서버문제로 인한 ChunkedEncodingError(e) #9

Closed ul8ksgdmy closed 5 years ago

ul8ksgdmy commented 5 years ago

error log ... ( 중략) 크롤링 진행사항 : 21726 / 21998 크롤링 진행사항 : 21727 / 21998 Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 601, in _update_chunk_length self.chunk_left = int(line, 16) ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 360, in _error_catcher yield File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 666, in read_chunked self._update_chunk_length() File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 605, in _update_chunk_length raise httplib.IncompleteRead(line) http.client.IncompleteRead: IncompleteRead(0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/requests/models.py", line 750, in generate for chunk in self.raw.stream(chunk_size, decode_content=True): File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 490, in stream for line in self.read_chunked(amt, decode_content=decode_content): File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 694, in read_chunked self._original_response.close() File "/usr/lib64/python3.6/contextlib.py", line 99, in exit self.gen.throw(type, value, traceback) File "/usr/lib/python3.6/site-packages/urllib3/response.py", line 378, in _error_catcher raise ProtocolError('Connection broken: %r' % e, e) urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "ruri_main.py", line 49, in cd.insertone(cr.crawling('ilbe', 1000)) #저장할 컬렉션은 ini 파일에서 변경해야 함 File "/home/centos/tmp/ruri_service.py", line 29, in crawling result = wc.crawlingposts(lastpage, ctargetdata) #크롤링 실행 및 결과를 변수에 담음 File "/home/centos/tmp/ruri_crawler.py", line 245, in crawlingposts contents_part_list = self.cr_lowerpages(headers, upper_page_list, keykeys, keyvalues) File "/home/centos/tmp/ruri_crawler.py", line 146, in cr_lowerpages inner_res = requests.get(innerlink, headers=headers) File "/usr/lib/python3.6/site-packages/requests/api.py", line 75, in get return request('get', url, params=params, kwargs) File "/usr/lib/python3.6/site-packages/requests/api.py", line 60, in request return session.request(method=method, url=url, kwargs) File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 533, in request resp = self.send(prep, **send_kwargs) File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 686, in send r.content File "/usr/lib/python3.6/site-packages/requests/models.py", line 828, in content self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b'' File "/usr/lib/python3.6/site-packages/requests/models.py", line 753, in generate raise ChunkedEncodingError(e) requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

해결책으로 아래 링크 참조 (수정중) https://stackoverflow.com/questions/44509423/python-requests-chunkedencodingerrore-requests-iter-lines

ul8ksgdmy commented 5 years ago

requests.exceptions.ChunkedEncodingError 예외처리로 해결

크롤링 진행사항 : 18262 / 21998 크롤링 진행사항 : 18263 / 21998 requests.exceptions.ChunkedEncodingError에서 에러 발생 ("Connection broken: ConnectionResetError(104, 'Connection reset by peer')", ConnectionResetError(104, 'Connection reset by peer')) 오류 다음 페이지에서 재접속 오류가 일어난 페이지 처리 크롤링 진행사항 : 18264 / 21998 크롤링 진행사항 : 18265 / 21998