digitalmethodsinitiative / dmi-tcat

Digital Methods Initiative - Twitter Capture and Analysis Toolset
Apache License 2.0
367 stars 114 forks source link

Read operation times out for urlexpand - problem? #148

Closed mhockenhull closed 8 years ago

mhockenhull commented 8 years ago

Hi again,

I have installed the python packages needed for running the urlexpand.py script. As far as I can see from the data, the script is working as I'm getting expanded urls. Just wanted to make sure that this isn't something I should be worried about:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/gevent-1.1rc3-py2.7-linux-x86_64.egg/gevent/greenlet.py", line 522, in run
    result = self._run(*self.args, **self.kwargs)
  File "urlexpand.py", line 122, in job
    resp = requests.get(url, headers=request_headers, timeout=socket_timeout, verify=False)
  File "/usr/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 455, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 578, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 178, in resolve_redirects
    allow_redirects=False,
  File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 558, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 394, in send
    r.content
  File "/usr/lib/python2.7/dist-packages/requests/models.py", line 679, in content
    self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
  File "/usr/lib/python2.7/dist-packages/requests/models.py", line 616, in generate
    decode_content=True):
  File "/usr/lib/python2.7/dist-packages/urllib3/response.py", line 225, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/lib/python2.7/dist-packages/urllib3/response.py", line 174, in read
    data = self._fp.read(amt)
  File "/usr/lib/python2.7/httplib.py", line 549, in read
    return self._read_chunked(amt)
  File "/usr/lib/python2.7/httplib.py", line 618, in _read_chunked
    value.append(self._safe_read(chunk_left))
  File "/usr/lib/python2.7/httplib.py", line 664, in _safe_read
    chunk = self.fp.read(min(amt, MAXAMOUNT))
  File "/usr/lib/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
  File "/usr/local/lib/python2.7/dist-packages/gevent-1.1rc3-py2.7-linux-x86_64.egg/gevent/_ssl2.py", line 222, in recv
    return self.read(buflen)
  File "/usr/local/lib/python2.7/dist-packages/gevent-1.1rc3-py2.7-linux-x86_64.egg/gevent/_ssl2.py", line 124, in read
    self._wait(self._read_event, timeout_exc=_SSLErrorReadTimeout)
  File "/usr/local/lib/python2.7/dist-packages/gevent-1.1rc3-py2.7-linux-x86_64.egg/gevent/_socket2.py", line 171, in _wait
    self.hub.wait(watcher)
  File "/usr/local/lib/python2.7/dist-packages/gevent-1.1rc3-py2.7-linux-x86_64.egg/gevent/hub.py", line 606, in wait
    result = waiter.get()
  File "/usr/local/lib/python2.7/dist-packages/gevent-1.1rc3-py2.7-linux-x86_64.egg/gevent/hub.py", line 854, in get
    return self.hub.switch()
  File "/usr/local/lib/python2.7/dist-packages/gevent-1.1rc3-py2.7-linux-x86_64.egg/gevent/hub.py", line 585, in switch
    return greenlet.switch(self)
SSLError: The read operation timed out
<Greenlet at 0x7fcdccb41690: job('http://fb.me/3GZAxrihj', 'test_urls')> failed with SSLError

Sorry to bother you if this is something obvious...

Best, Michael

ErikBorra commented 8 years ago

Hi Michael,

this is related to issue #82 which unfortunately we have not found a satisfactory solution for yet. To make sure that the script never hangs for too long, the [https://github.com/digitalmethodsinitiative/dmi-tcat/blob/master/helpers/urlexpand.sh](script which is called from crontab) will make sure that URL expansion is restarted about each hour.

Conclusion: nothing to worry about, although we are looking for a cleaner solution.

Best,

Erik