Closed tsgit closed 7 years ago
I note that legacy prod is on version 0.6.7 of harvestingkit (from pypi) while the github version is 0.6.8 ?
the User-Agent is necessary for e.g. Elsevier
In [2]: s=requests.session()
In [3]: s.get('http://www.sciencedirect.com/science/article/pii', timeout=60)
---------------------------------------------------------------------------
ReadTimeout Traceback (most recent call last)
<ipython-input-3-16a8a01c12d0> in <module>()
----> 1 s.get('http://www.sciencedirect.com/science/article/pii', timeout=60)
/usr/lib/python2.6/site-packages/requests/sessions.pyc in get(self, url, **kwargs)
485
486 kwargs.setdefault('allow_redirects', True)
--> 487 return self.request('GET', url, **kwargs)
488
489 def options(self, url, **kwargs):
/usr/lib/python2.6/site-packages/requests/sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
473 }
474 send_kwargs.update(settings)
--> 475 resp = self.send(prep, **send_kwargs)
476
477 return resp
/usr/lib/python2.6/site-packages/requests/sessions.pyc in send(self, request, **kwargs)
583
584 # Send the request
--> 585 r = adapter.send(request, **kwargs)
586
587 # Total elapsed time of the request (approximately)
/usr/lib/python2.6/site-packages/requests/adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
477 raise SSLError(e, request=request)
478 elif isinstance(e, ReadTimeoutError):
--> 479 raise ReadTimeout(e, request=request)
480 else:
481 raise
ReadTimeout: HTTPConnectionPool(host='www.sciencedirect.com', port=80): Read timed out. (read timeout=60)
In [5]: s.get('http://www.sciencedirect.com/science/article/pii', headers={'user-agent': 'HarvestingKit/0.6.7'}, timeout=60)
Out[5]: <Response [200]>
added basic tests for harvestingkit.utils.make_user_agent() which takes info via pkg_resources
Signed-off-by: Thorsten Schwander thorsten.schwander@gmail.com