antirez / disque

Disque is a distributed message broker
BSD 3-Clause "New" or "Revised" License
8.01k stars 537 forks source link

Un-meet cluster #127

Closed xeraa closed 8 years ago

xeraa commented 8 years ago

Meeting cluster members is working as expected, but how can can I un-meet a node?

My cluster looks like this:

127.0.0.1:7711> hello
1) (integer) 1
2) "42bb9ed827379dc5c4acc733d8665f721aa2a8da"
3) 1) "f63620b151328c7d30c99ce3fbeb7d562663da4f"
   2) "10.1.24.245"
   3) "7711"
   4) "1"
4) 1) "42bb9ed827379dc5c4acc733d8665f721aa2a8da"
   2) "10.1.21.76"
   3) "7711"
   4) "1"
5) 1) "d5fcb2baba4357a22d5de699a0a55dd8737a8773"
   2) "10.1.92.246"
   3) "7711"
   4) "1"
127.0.0.1:7711> cluster info
cluster_state:ok
cluster_known_nodes:3
cluster_reachable_nodes:1
cluster_size:2
cluster_stats_messages_sent:5359
cluster_stats_messages_received:5395
127.0.0.1:7711> exit
ubuntu@mq:~$ cat /opt/disque/nodes.conf 
42bb9ed827379dc5c4acc733d8665f721aa2a8da 10.1.21.76:7711 myself 0 0 connected
d5fcb2baba4357a22d5de699a0a55dd8737a8773 10.1.92.246:7711 noflags 0 1443780313124 connected

Node 1) does not exist any longer — how can I remove it :)? A restart didn't help and the nodes.conf is also up to date

mp911de commented 8 years ago

Try CLUSTER FORGET <nodeid>

xeraa commented 8 years ago

Thanks @mp911de :)

However, my cluster seems to be broken or I don't get it:

127.0.0.1:7711> hello
1) (integer) 1
2) "42bb9ed827379dc5c4acc733d8665f721aa2a8da"
3) 1) "b64defc53761cdfcbef5a1643ad288665052108f"
   2) "10.1.24.245"
   3) "7711"
   4) "1"
4) 1) "d5fcb2baba4357a22d5de699a0a55dd8737a8773"
   2) "10.1.92.246"
   3) "7711"
   4) "1"
5) 1) "42bb9ed827379dc5c4acc733d8665f721aa2a8da"
   2) "10.1.21.76"
   3) "7711"
   4) "1"
127.0.0.1:7711> cluster forget b64defc53761cdfcbef5a1643ad288665052108f
(error) ERR Unknown node b64defc53761cdfcbef5a1643ad288665052108f
127.0.0.1:7711> hello
1) (integer) 1
2) "42bb9ed827379dc5c4acc733d8665f721aa2a8da"
3) 1) "87d18049a2cff70e5321d20924b56a5aaf1042be"
   2) "10.1.24.245"
   3) "7711"
   4) "1"
4) 1) "d5fcb2baba4357a22d5de699a0a55dd8737a8773"
   2) "10.1.92.246"
   3) "7711"
   4) "1"
5) 1) "42bb9ed827379dc5c4acc733d8665f721aa2a8da"
   2) "10.1.21.76"
   3) "7711"
   4) "1"
127.0.0.1:7711> cluster forget d5fcb2baba4357a22d5de699a0a55dd8737a8773
OK
127.0.0.1:7711> hello
1) (integer) 1
2) "42bb9ed827379dc5c4acc733d8665f721aa2a8da"
3) 1) "42bb9ed827379dc5c4acc733d8665f721aa2a8da"
   2) "10.1.21.76"
   3) "7711"
   4) "1"
127.0.0.1:7711> cluster meet 10.1.92.246 7711
OK
127.0.0.1:7711> hello
1) (integer) 1
2) "42bb9ed827379dc5c4acc733d8665f721aa2a8da"
3) 1) "489b85cfe0d7b581f42272fe18a3aeaccaac7e7f"
   2) "10.1.24.245"
   3) "7711"
   4) "1"
4) 1) "d5fcb2baba4357a22d5de699a0a55dd8737a8773"
   2) "10.1.92.246"
   3) "7711"
   4) "1"
5) 1) "42bb9ed827379dc5c4acc733d8665f721aa2a8da"
   2) "10.1.21.76"
   3) "7711"
   4) "1"
antirez commented 8 years ago

@xeraa now we have a much better nodes graceful removal feature (described in the README), however what you report is strange indeed. It looks like if, while doing the CLUSTER FORGET in this node, somebody was also modifying the configuration of 10.1.24.245 manually, by restarting it without a nodes.conf. Is this possible? That would exactly explain the behavior above.

xeraa commented 8 years ago

One of the two currently existing nodes has had the IP 10.1.24.245 at some point in the past. But after a restart it changed its IP.

We'll upgrade Disque and see if we run into a similar issue. At the moment everything is working as expected.

Thanks for the update!