Open psd314 opened 4 years ago
@psd314 I have been looking over this issue and dug into the code a fair bit now to see what the issue is and what i can do to resolve it. What i decided for now is to put this on the backburner for a bit and take a look at it again in the future. My current thinking about all of these cluster management and configuration commands is that i really dont want them to be inside the main RedisCluster class, but to have them outside the client class in a special management class that you would have to instantiate and use/call separate from the regular RedisCluster class. I have some ideas on how to make this work much better by providing much better helper methods and tools to manipulate the cluster when doing it in a separate class where loops and iterators would help to administrate a cluster much better.
For anyone else finidng this issue, right now my suggestion is that you use a plain Redis
client instance and connect to your nodes and run the administration of your cluster through that until i can rebuild this feature to something that works the way i want it to.
Description
I have a 9 node cluster with 3 masters, each master on its own machine. The node is cross-replicated so the 2 slaves on each machine are replicas of the other 2 non-local masters. I'm working on re-balancing the cluster by manually failing over a master after a machine has gone down and comes back up where two masters end up on one machine.
Expected
That the targeted slave node should be promoted to master and the associated master be demoted to a slave. I can do this successfully on this cluster via
redis-cli -h <host> -p <port> -a <password> CLUSTER FAILOVER FORCE
.Actual
Bug
The
node_id
argument in thecluster_failover
method does not get passed on toexecute_command
and is never used to identify the proper node to connect to. From rediscluster/client.py:It looks like the
option
argument gets used as key to identify the node via slot lookup and the master node for that slot gets returned. Also from rediscluster/client.py:arg[1], the
option
arg incluster_failover
, gets used to determine slot inself._determine_slot(*args)
slot
gets used to look up node to failover, which returns the master node where the slot is located rather than thenode_id
passed tocluster_failover
I was able to work around this and get the slave to initiate the failover by finding the target node in
rc.connection_pool.nodes.all_nodes()
and use it to establish the connection to that node and send theCLUSTER FAILOVER
command.Tasks
Allow
cluster_failover
to target a slave node by id to start a manual failover.