basho / riak-python-client

The Riak client for Python.
Apache License 2.0
320 stars 182 forks source link

Leaking exceptions when closing TCP connections [JIRA: CLIENTS-837] #461

Closed Tinche closed 8 years ago

Tinche commented 8 years ago

Hi,

I'm the original author of #399. I'm trying out the newest version (2.5.1) to see if we can drop our internal workaround. When the issue from #399 occurs, the client will leak an exception to the surrounding code:

Traceback (most recent call last):
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/transports/pool.py", line 158, in transaction
    yield resource.object
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/client/transport.py", line 121, in _with_retries
    return fn(transport)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/client/transport.py", line 177, in thunk
    return fn(self, transport, *args, **kwargs)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/client/operations.py", line 712, in get
    notfound_ok=notfound_ok)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/transports/tcp/transport.py", line 135, in get
    resp_code, resp = self._request(msg, codec)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/transports/tcp/transport.py", line 537, in _request
    resp_code, data = self._send_recv(msg_code, data)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/transports/tcp/connection.py", line 39, in _send_recv
    return self._recv_msg()
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/transports/tcp/connection.py", line 164, in _recv_msg
    msgbuf = self._recv_pkt()
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/transports/tcp/connection.py", line 178, in _recv_pkt
    msglen_buf = self._recv(4)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/transports/tcp/connection.py", line 195, in _recv
    raise BadResource('recv_into returned zero bytes unexpectedly')
riak.transports.pool.BadResource: recv_into returned zero bytes unexpectedly

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/ngs_worker/base_processor.py", line 252, in process_game
    for response in responses:
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/ngs_worker/processor.py", line 175, in first_comm
    user.migrate()
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/ngs_worker/middleware/ngs_user.py", line 52, in migrate
    self._savegame.check_migrate(self.version)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/ngs_worker/middleware/ngs_savegame.py", line 74, in check_migrate
    sg_user_dict = self.main_riak.savegame_user
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/ngs_worker/middleware/ngs_main_riak.py", line 56, in savegame_user
    return self._get_and_cache(self.USER_BUCKET)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/ngs_worker/middleware/ngs_main_riak.py", line 42, in _get_and_cache
    robj = rbucket.get(self.username)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/bucket.py", line 234, in get
    notfound_ok=notfound_ok)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/riak_object.py", line 299, in reload
    self.client.get(self, r=r, pr=pr, timeout=timeout)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/client/transport.py", line 179, in wrapper
    return self._with_retries(pool, thunk)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/client/transport.py", line 128, in _with_retries
    raise
  File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/transports/pool.py", line 160, in transaction
    self.delete_resource(resource)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/transports/pool.py", line 176, in delete_resource
    self.destroy_resource(resource.object)
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/transports/tcp/__init__.py", line 24, in destroy_resource
    tcp.close()
  File "/Users/ttvrtkovic/workspace/ngs-worker-ct2/.venv/lib/python3.5/site-packages/riak/transports/tcp/connection.py", line 230, in close
    self._socket.shutdown(socket.SHUT_RDWR)
OSError: [Errno 57] Socket is not connected

I'm guessing it's an error to shutdown a socket in this state (half-closed). In any case, the expectation is for these errors to not propagate, but to be handled by the pools transparently.

lukebakken commented 8 years ago

@Tinche - could you please comment out the call to shutdown within the Python client code in your environment to see if that resolves the issue? The BadResource exception should cause the operation to re-try.

Tinche commented 8 years ago

Will do, I'll play around with it.

lukebakken commented 8 years ago

Thank you, I appreciate it. I'm going to try and reproduce here as well.

Tinche commented 8 years ago

I've left work for the day, but something just occurred to me worth mentioning: I've been testing out this code all morning on our Linux servers without this issue popping up, but when I started the application in question locally on my work Mac it manifested very quickly. Maybe it's a platform-specific thing?

lukebakken commented 8 years ago

It could be. I'm planning on addressing it by catching EnvironmentError and logging it.

Tinche commented 8 years ago

Commenting out the shutdown removes the exceptions on my OS X workstation. I'll try catching EnvironmentError instead and see what happens.