Closed datasatanic closed 3 years ago
@datasatanic I need a stack trace for this as i need to see exactly where you get a connection error from. If you get it from the iniitailization process or if you get it from your keys command. There will be no option to return partial data because if you can't really get to a node in your cluster there is typically some issue. But i will say that the logic of which these multi node commands use is a bit different compared to all other regular commands and it might just be that when you get a connection failure on one of the nodes the client should technically ask for a cluster update in case the master node have moved over to some other IP. What you can do to manually force the cluster to reinitialize in case you hade a master failover event that is not handled properly is to try/except the keys command and then you run the initialize() method again manually on the RedisCluster object and it should force it to reload in those cases. Look at how ConnectionErrors is handled in _execute_command()
File "/home/ilya/Documents/Projects/SBP/referencedataloader/api/misc.py", line 72, in reload
cluster_keys = redis_cluster.keys(f'{prefix}.*')
File "/home/ilya/miniconda3/envs/referencedataloader/lib/python3.9/site-packages/redis/client.py", line 1661, in keys
return self.execute_command('KEYS', pattern)
File "/home/ilya/miniconda3/envs/referencedataloader/lib/python3.9/site-packages/rediscluster/client.py", line 555, in execute_command
return self._execute_command(*args, **kwargs)
File "/home/ilya/miniconda3/envs/referencedataloader/lib/python3.9/site-packages/rediscluster/client.py", line 581, in _execute_command
return self._execute_command_on_nodes(node, *args, **kwargs)
File "/home/ilya/miniconda3/envs/referencedataloader/lib/python3.9/site-packages/rediscluster/client.py", line 738, in _execute_command_on_nodes
connection.send_command(*args)
File "/home/ilya/miniconda3/envs/referencedataloader/lib/python3.9/site-packages/redis/connection.py", line 725, in send_command
self.send_packed_command(self.pack_command(*args),
File "/home/ilya/miniconda3/envs/referencedataloader/lib/python3.9/site-packages/redis/connection.py", line 698, in send_packed_command
self.connect()
File "/home/ilya/miniconda3/envs/referencedataloader/lib/python3.9/site-packages/redis/connection.py", line 563, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 111 connecting to 192.168.75.120:7000. Connection refused.
What version of this code are you using? what version of redis-py?
redis-py-cluster 2.1.2 redis 3.5.3
Found problem there rediscluster/client.py
def _execute_command_on_nodes(self, nodes, *args, **kwargs):
"""
"""
command = args[0]
res = {}
for node in nodes:
connection = self.connection_pool.get_connection_by_node(node)
# copy from redis-py
try:
connection.send_command(*args)
res[node["name"]] = self.parse_response(connection, command, **kwargs)
except (ConnectionError, TimeoutError) as e: # ConnectionError: Error 111 connecting to 192.168.75.109:7000. Connection refused
connection.disconnect()
if not connection.retry_on_timeout and isinstance(e, TimeoutError):
raise
connection.send_command(*args)
res[node["name"]] = self.parse_response(connection, command, **kwargs) # FAIL There
except ClusterDownError:
self.connection_pool.disconnect()
self.connection_pool.reset()
self.refresh_table_asap = True
raise
finally:
self.connection_pool.release(connection)
So that piece of code technically does what it is supposed to do. All of these multislot and multinode commands always have been worse implemented then all of the other single slot commands. And due to that i never merged both code paths into a single unified solution then here we are with these kinds of problems. I will argue that i think that sending a command that is supposed to go to all nodes and fails on one of the expected cluster nodes is kinda intended. There is no plans on rebulding this in the 2.x.x version track of this lib so you will have to wait for 3.0 or make some other solution on your own side, or submit a PR that fixes this issue
Code to reproduse
I know that the command is sent to all nodes. But if one of nodes failed in work, all command raise ConnectionError. Is it possible to return only active node? Or force reload cluster config? How reload config if I use host,port in args, not startup-nodes (If the node specified in the arguments is broken) ?