Open katelynstenger opened 5 years ago
Here is the total error message:
BadStatusLine Traceback (most recent call last) ~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 600 body=body, headers=headers, --> 601 chunked=chunked) 602
~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 386 # otherwise it looks like a programming error was the cause. --> 387 six.raise_from(e, None) 388 except (SocketTimeout, BaseSSLError, SocketError) as e:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\packages\six.py in raise_from(value, from_value)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 382 try: --> 383 httplib_response = conn.getresponse() 384 except Exception as e:
~\AppData\Local\Continuum\anaconda3\lib\http\client.py in getresponse(self) 1330 try: -> 1331 response.begin() 1332 except ConnectionError:
~\AppData\Local\Continuum\anaconda3\lib\http\client.py in begin(self) 296 while True: --> 297 version, status, reason = self._read_status() 298 if status != CONTINUE:
~\AppData\Local\Continuum\anaconda3\lib\http\client.py in _read_status(self) 278 self._close_conn() --> 279 raise BadStatusLine(line) 280
BadStatusLine: Error #2000
During handling of the above exception, another exception occurred:
ProtocolError Traceback (most recent call last) ~\AppData\Local\Continuum\anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 439 retries=self.max_retries, --> 440 timeout=timeout 441 )
~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 638 retries = retries.increment(method, url, error=e, _pool=self, --> 639 _stacktrace=sys.exc_info()[2]) 640 retries.sleep()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace) 356 if read is False or not self._is_method_retryable(method): --> 357 raise six.reraise(type(error), error, _stacktrace) 358 elif read is not None:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\packages\six.py in reraise(tp, value, tb) 684 if value.traceback is not tb: --> 685 raise value.with_traceback(tb) 686 raise value
~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 600 body=body, headers=headers, --> 601 chunked=chunked) 602
~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 386 # otherwise it looks like a programming error was the cause. --> 387 six.raise_from(e, None) 388 except (SocketTimeout, BaseSSLError, SocketError) as e:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\packages\six.py in raise_from(value, from_value)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 382 try: --> 383 httplib_response = conn.getresponse() 384 except Exception as e:
~\AppData\Local\Continuum\anaconda3\lib\http\client.py in getresponse(self) 1330 try: -> 1331 response.begin() 1332 except ConnectionError:
~\AppData\Local\Continuum\anaconda3\lib\http\client.py in begin(self) 296 while True: --> 297 version, status, reason = self._read_status() 298 if status != CONTINUE:
~\AppData\Local\Continuum\anaconda3\lib\http\client.py in _read_status(self) 278 self._close_conn() --> 279 raise BadStatusLine(line) 280
ProtocolError: ('Connection aborted.', BadStatusLine('Error #2000\n',))
During handling of the above exception, another exception occurred:
ConnectionError Traceback (most recent call last)
@katelynstenger Thanks for submitting this issue, I get connection errors too. My guess is they've introduced rate limiting on the site. I'll take a look, introduce time.sleep()
, and troubleshoot from there.
@daneads @katelynstenger I am wondering if time.sleep()
was ever introduced as part of pypatents? When I got started with this library, I encountered issues when attempting to retrieve large amounts of data. I did not dig too deep but I thought that the rate-limiting might still be an issue.
I first noticed that using selenium really helped but then I found this page and found your idea interesting.
I tested introducing sleep(0.5)
in on line 328 after the patents.append(p)
; under the get_patents_from_results_url
. Also, on line 8,from time import sleep
The results seem promising by adding sleep(); however, I'm not sure if this the best place to use the function. There is an obvious time tradeoff, it runs longer, but the search seems to work since it looks like it is easier on the server.
Testing for time.sleep()
performance:
Without time.sleep(), run pypatent.Search('crispr', results_limit=test, get_patent_details=True, web_connection=conn)
at varying results_limits (where test = 500, 200, and 5)
With edits to introducetime.sleep(0.5)
, run the same searches.
From Mac, Chrome, Jupyter Notebook, Python 3.7.3
btw, thank you so much for this library!
My script iterates through a list of patents I want to collect information on. I initially received this error: Exception is: ('Connection aborted.', error(10054, '')) I introduced a time.sleep(2) between calls of pypatent.Search function and remediated this error.
In the 5th iteration of pypatent.Search() , I received this error: ConnectionError: ('Connection aborted.', BadStatusLine('Error #2000\n',))
Any suggestions on remediating this error? Thank you for your help in advance!