About Init RedisCluster hanging problem.

Grokzen / redis-py-cluster

Python cluster client for the official redis cluster. Redis 3.0+.

https://redis-py-cluster.readthedocs.io/

MIT License

1.1k stars 315 forks source link

About Init RedisCluster hanging problem. #440

Closed Milittle closed 3 years ago

Milittle commented 3 years ago

I found issue 387 meant it.

If your start_nodes has dead redis server node. I am afriad of the init RedisCluster is hanging.

I try to dive into it, then i found the follow code snippt:

https://github.com/Grokzen/redis-py-cluster/blob/570d7f23faa9bde8031f5e2622e4c01aef3c4f7a/rediscluster/nodemanager.py#L226

BTW, If you specify the init parameter socket_timeout, the hanging time is socket_timeout sec

I think this project should consider about this scene.

Grokzen commented 3 years ago

@Milittle I am not really sure this is a problem for this package as all options and parameters is sent into the Redis class instance and this package do want to handle these special cases as little or none at all. The Redis instance that is used in the command you are refering to is called here https://github.com/Grokzen/redis-py-cluster/blob/570d7f23faa9bde8031f5e2622e4c01aef3c4f7a/rediscluster/nodemanager.py#L195 and it that allows for socket_timeout to be set during the initial cluster discovery phase of things.It is not up to this lib to ensure that you configure your client correct for your case/redis setup, but if we are missing to pass in some parameter that you set when you crease your RedisCluster instance, then we can talk about it i mean.

Milittle commented 3 years ago

I mean, as a cluster client package, It should handle this siuation. If the redis is cluster deploy and have 3 master node, 3 slave node. When I config the 6 node to start_nodes, the above 3 nodes are dead. The RedisCluster have a automatic cycle these nodes to ensure the get connection?

If I understand wrong. Please let know, thank you work.

Grokzen commented 3 years ago

The try/except block already handles these two cases https://github.com/Grokzen/redis-py-cluster/blob/570d7f23faa9bde8031f5e2622e4c01aef3c4f7a/rediscluster/nodemanager.py#L228 where if you timeout or get connectino error it will fail out for that node and continue to the next node. If you get any other error then that error will be raised up to the calling user. This is intended and expected behaviour imo. If you think otherwise then you need to show me the exception you are getting that you think should probably be added to the list of cases that we should continue on with.

Milittle commented 3 years ago

OK, I understand you meaning. Thanks.