Grokzen / redis-py-cluster

Python cluster client for the official redis cluster. Redis 3.0+.
https://redis-py-cluster.readthedocs.io/
MIT License
1.1k stars 315 forks source link

rediscluster.RedisCluster hangs if in the list of startup_nodes first node is non-operational #387

Closed shivam-tripathi closed 4 years ago

shivam-tripathi commented 4 years ago

The list of start_nodes is fetched from a configuration server, which is then fed to RedisCluster. During runtime, any of the redis node can be unreachable due to a number of reasons, and this is not reflected in the config.

If I try to connect to connect to a cluster with startup_nodes say [{'host': <a.host>, 'port': <a.port>}, ....] and for some reason a.host is down, the command conn = RedisCluster(startup_nodes=startup_nodes, decode_responses=True) hangs. It doesn't move over to the next active node in the list. I think this might not be ideal.

Version: redis-py-cluster==2.0.0

Grokzen commented 4 years ago

@shivam-tripathi I have never seen a case where it would sit and hang inside the client code forever. Can you provide a more detailed example script and cluster setup that would replicate the issue so i can see if i can debug and trace the error down any further?

If for example try to run a client against a cluster that is completely shutdown i get the following error and stacktrace in my client side.

DEBUG:rediscluster.nodemanager:[
  {
    "host": "127.0.0.1",
    "port": 7000
  },
  {
    "host": "127.0.0.1",
    "port": 7001
  },
  {
    "host": "127.0.0.1",
    "port": 7002
  },
  {
    "host": "127.0.0.1",
    "port": 7003
  },
  {
    "host": "127.0.0.1",
    "port": 7004
  },
  {
    "host": "127.0.0.1",
    "port": 7005
  }
]
Traceback (most recent call last):
  File "rediscluster_failover_test.py", line 25, in <module>
    rc = RedisCluster(startup_nodes=startup_nodes, decode_responses=True)
  File "/home/grok/github/redis-py-cluster/rediscluster/client.py", line 371, in __init__
    **kwargs
  File "/home/grok/github/redis-py-cluster/rediscluster/connection.py", line 160, in __init__
    self.nodes.initialize()
  File "/home/grok/github/redis-py-cluster/rediscluster/nodemanager.py", line 270, in initialize
    raise RedisClusterException("Redis Cluster cannot be connected. Please provide at least one reachable node.")
rediscluster.exceptions.RedisClusterException: Redis Cluster cannot be connected. Please provide at least one reachable node.
Exception ignored in: <object repr() failed>
Traceback (most recent call last):
  File "/home/grok/.virtualenvs/redis/lib/python3.6/site-packages/redis/client.py", line 885, in __del__
    self.close()
  File "/home/grok/.virtualenvs/redis/lib/python3.6/site-packages/redis/client.py", line 888, in close
    conn = self.connection
AttributeError: 'RedisCluster' object has no attribute 'connection'

None of the nodes is reachable in any sense and this error is what i am expecting in a situation where no nodes is reachable.

@shivam-tripathi Please post more detail steps to reproduce the problem in order for me to help any further

shivam-tripathi commented 4 years ago

@Grokzen Hi, the issue I found was when the first node in the list of startup_nodes is down. More details are as follows:

  1. The config.REDIS_CONF looks something like this:

    {
    "cluster": {
    "hosts":[
      {"host":"a.a.a.a","port":6379},
      {"host":"b.b.b.b","port":6379},
      {"host":"c.c.c.c","port":6379},
      {"host":"d.d.d.d","port":6379},
      {"host":"e.e.e.e","port":6379},
      {"host":"f.f.f.f","port":6379}
    ]
    }
    }

    Of this, node a.a.a.a:6379 is down. The node b.b.b.b:6379, c.c.c.c:6379, d.d.d.d:6379 are available. After this node e.e.e.e:6379 is again down.

  2. The wrapper code utilizing rediscluster to connect with the cluster is as follows:

    
    from utils.Config import Config
    from rediscluster import RedisCluster
    from redis import Redis

class QRedis: config = Config() redis = None def init(self): self.__redis = RedisCluster(startup_nodes=self.__config.REDIS_CONF['cluster']['hosts'], decode_responses=True)

def hscan_iter(self, hset_name, count=100):
    return self.__redis.hscan_iter(hset_name, count=count)

def run(self, cmd, *kwargs):
    return self.__redis[cmd](*kwargs)

def client(self):
    return self.__redis
This for some reason hangs with no additional logs. 

3. If I do:
```python
self.__redis = RedisCluster(startup_nodes=self.__config.REDIS_CONF['cluster']['hosts'][1:], decode_responses=True)

it connects and works as expected. Note that the node b.b.b.b:6379 is available. I am slicing the array so that first node is removed from the startup_nodes. Similarly, self.__config.REDIS_CONF['cluster']['hosts'][2:] and self.__config.REDIS_CONF['cluster']['hosts'][3:] work. However, self.__config.REDIS_CONF['cluster']['hosts'][4:] again hangs, which corresponds to next dead node i.e. e.e.e.e:6379.

  1. Python version is: Python 3.6.6. The command pip freeze gives redis-py-cluster==2.0.0.

This is everything I could collect which I thought could be of help. Let me know if this is enough, and if anything additional is required.
Thank you so much for your amazing work. 😄

Grokzen commented 4 years ago

@shivam-tripathi It kinda makes no sense that it should only fail if the first one do not connect O.o i will have to dig deeper into this but i would recommen that you try it with the RC2 release that is on pypi now and see if it behaves the same or not. But i will run a test against a local cluster and see becuase i have not seen that problem myself so far.

Grokzen commented 4 years ago

@shivam-tripathi I attempted this issue now locally and i am unable to reproduce the error and to me it seems and looks like the code work out as expected and i don't know what your issue is really or why it happens.

I attempted to run the following client code on my local redis cluster with the port 7000 node shutdown

from rediscluster import RedisCluster

startup_nodes = [
    {"host": "127.0.0.1", "port": "7000"},
    {"host": "127.0.0.1", "port": "7001"},
    {"host": "127.0.0.1", "port": "7002"},
    {"host": "127.0.0.1", "port": "7003"},
    {"host": "127.0.0.1", "port": "7004"},
    {"host": "127.0.0.1", "port": "7005"},
]

rc = RedisCluster(
    startup_nodes=startup_nodes,
    decode_responses=True,
)

from pprint import pprint
pprint(rc.cluster_info())
print(rc.set('foobar', 'asd'))
print(rc.get('foobar'))

And that code above works as expected and without any issues at all and it do not block or anything.

The only thing that i can see in your code example above is that if you are using hscan_iter and that is where it hangs and not during the cluster initialization steps?

Also i would recommend that you take down the master branch of this repo or installs 2.0.99RC2 release from pypi and use that code in your example and see if that helps out anything for you.

If that do not help and your problem persist then you have to provide much more detailed and deeper debugging into what part that fails in the code and not so much in the top layer of the client and your code. Please open a new issue with that if the problem still remains in the new client version.