getsentry / rb

Routing and connection management for Redis in Python
http://rb.readthedocs.org/en/latest/
Apache License 2.0
301 stars 30 forks source link

When a redis server drops an already established connection, it fails loudly #16

Open mattrobenolt opened 8 years ago

mattrobenolt commented 8 years ago

This can be tested by setting CONFIG SET timeout 1 in Redis, then running:

from time import sleep
from rb import Cluster

cluster = Cluster({0:{'host': '127.0.0.1', 'port': 6379}})

with cluster.map() as c:
    c.get('foo')

sleep(2)

with cluster.map() as c:
    c.get('foo')

Yells with redis.exceptions.ConnectionError: Error while trying to write requests to redis..

The problem is the first poll on the socket is a close event before any data has been written to the socket.

We should detect this and attempt to reconnect.

mattrobenolt commented 8 years ago

On Linux, this manifests slightly differently and fails in a different place:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python2.7/site-packages/rb/clients.py", line 577, in __exit__
    self.mapping_client.join(timeout=timeout)
  File "/usr/local/lib/python2.7/site-packages/rb/clients.py", line 366, in join
    command_buffer.wait_for_responses(self)
  File "/usr/local/lib/python2.7/site-packages/rb/clients.py", line 208, in wait_for_responses
    self.connection, command_name)
  File "/usr/local/lib/python2.7/site-packages/redis/client.py", line 585, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 577, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 238, in read_response
    response = self._buffer.readline()
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 168, in readline
    self._read_from_socket()
  File "/usr/local/lib/python2.7/site-packages/redis/connection.py", line 143, in _read_from_socket
    (e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)
benh57 commented 7 years ago

In our case this redis connection drop apparently causes a 500 error posting one of the events to the API. Any idea why the connection drops?

Jan 20 11:02:29 xxx sentry: 127.0.0.1 - - [20/Jan/2017:19:02:29 +0000] "POST /api/22/store/ HTTP/1.1" 200 366 "-" "raven-python/5.4.0.dev0" Jan 20 11:02:30 xxx sentry: Traceback (most recent call last): Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/sentry/web/api.py", line 102, in dispatch Jan 20 11:02:30 xxx sentry: args, kwargs) Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/sentry/web/api.py", line 225, in _dispatch Jan 20 11:02:30 xxx sentry: kwargs Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/sentry/../django/views/generic/base.py", line 87, in dispatch Jan 20 11:02:30 xxx sentry: return handler(request, args, kwargs) Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/sentry/web/api.py", line 286, in post Jan 20 11:02:30 xxx sentry: response_or_event_id = self.process(request, data=data, kwargs) Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/sentry/web/api.py", line 369, in process Jan 20 11:02:30 xxx sentry: (app.tsdb.models.organization_total_received, project.organization_id), Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/sentry/tsdb/redis.py", line 174, in incr_multi Jan 20 11:02:30 xxx sentry: self.calculate_expiry(rollup, max_values, timestamp), Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/rb/clients.py", line 623, in exit Jan 20 11:02:30 xxx sentry: self.mapping_client.join(timeout=timeout) Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/rb/clients.py", line 412, in join Jan 20 11:02:30 xxx sentry: command_buffer.wait_for_responses(self) Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/rb/clients.py", line 226, in wait_for_responses Jan 20 11:02:30 xxx sentry: self.connection, command_name, **options) Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/redis/client.py", line 585, in parse_response Jan 20 11:02:30 xxx sentry: response = connection.read_response() Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/redis/connection.py", line 577, in read_response Jan 20 11:02:30 xxx sentry: response = self._parser.read_response() Jan 20 11:02:30 xxx sentry: File "/var/lib/sentry/lib/python2.7/site-packages/redis/connection.py", line 357, in read_response Jan 20 11:02:30 xxx sentry: (e.args,)) Jan 20 11:02:30 xxx sentry: ConnectionError: Error while reading from socket: ('Connection closed by server.',) Jan 20 11:02:30 xxx sentry: 19:02:30 [ERROR] sentry: Error while reading from socket: ('Connection closed by server.',) Jan 20 11:02:30 xxx sentry: 127.0.0.1 - - [20/Jan/2017:19:02:30 +0000] "POST /api/22/store/ HTTP/1.1" 200 366 "-" "raven-python/5.4.0.dev0" Jan 20 11:02:30 xxx sentry: 127.0.0.1 - - [20/Jan/2017:19:02:30 +0000] "POST /api/2/store/ HTTP/1.1" 500 337 "-" "raven-python/5.4.0.dev0"

mattrobenolt commented 7 years ago

Many reasons. Network layer could be flaky, etc. In our case, it was very specifically while using Docker's networking stuff. Using --net=host on everything worked around the issue since we didn't need the extra networking abstraction and provided us no value.

ffledgling commented 6 years ago

Just hit this as well. Did anyone figure out a cause or a fix?

Antiarchitect commented 4 years ago

Having the same issue on K8s with redis-ha fresh chart with HAProxy enabled

linux0x5c commented 4 years ago

Hi, @Antiarchitect I have same issue,how do you solve it?

Antiarchitect commented 4 years ago

Hi @linux0x5c actually as I have isolated k8s environment I have in redis.conf

timeout 0

in haproxy.cfg

timeout server 0
timeout client 0

These settings drastically reduced the amount of failing connections but considered bad practice in an open environment. Actually I do not know why timeout 10m is not working as it is two times bigger than standard Redis keepalive 300s.

Hope this helps.

linux0x5c commented 4 years ago

@Antiarchitect ,OK,thanks for your help,I will try it.

kimxogus commented 4 years ago

Setting timeout 0 didn't fixed this issue for me