antirez / disque

Disque is a distributed message broker
BSD 3-Clause "New" or "Revised" License
8.01k stars 537 forks source link

make test fail to discover nodes #82

Closed renatoc closed 9 years ago

renatoc commented 9 years ago

root@server:~/disque# git log -n 1 commit c6346975595b9ab4e390e21379bad725379620f5 Merge: 83362b2 12770bd Author: Salvatore Sanfilippo antirez@gmail.com Date: Thu Jun 11 11:26:34 2015 +0200

Merge pull request #79 from sunheehnus/cluster

cluster: fix oversights

root@server:~/disque# make clean cd src && make clean make[1]: Entering directory /root/disque/src' rm -rf disque-server disque disque-check-aof *.o *.gcda *.gcno *.gcov disque.info lcov-html make[1]: Leaving directory/root/disque/src' root@ip-10-230-207-79:~/disque# make cd src && make all make[1]: Entering directory `/root/disque/src' CC adlist.o CC ae.o CC anet.o CC dict.o CC disque.o CC sds.o CC zmalloc.o CC lzf_c.o CC lzf_d.o CC pqsort.o CC sha1. o CC release.o CC networking.o CC util.o CC object.o CC config.o CC aof.o CC debug.o CC syncio.o CC cluster.o CC crc16.o CC endianconv.o CC slowlog.o CC bio.o CC memtest.o CC crc64.o CC setproctitle.o CC blocked.o CC latency.o CC sparkline.o CC rio.o CC job.o CC queue.o CC skiplist.o CC ack.o LINK disque-server CC disque-cli.o LINK disque CC disque-check-aof.o LINK disque-check-aof

Hint: It's a good idea to run 'make test' ;)

make[1]: Leaving directory /root/disque/src' root@server:~/disque# make test cd src && make test make[1]: Entering directory/root/disque/src' Starting disque #0 at port 25000 Starting disque #1 at port 25001 Starting disque #2 at port 25002 Starting disque #3 at port 25003 Starting disque #4 at port 25004 Starting disque #5 at port 25005 Starting disque #6 at port 25006 Testing unit: 00-base.tcl 09:27:40> (init) Restart killed instances: OK 09:27:40> Cluster nodes are reachable: OK 09:27:40> Cluster nodes hard reset: OK 09:27:40> Cluster Join and auto-discovery test: Cluster failed to join into a full mesh. (Jumping to next unit after error) Testing unit: 01-faildet.tcl 09:28:30> (init) Restart killed instances: OK 09:28:30> Cluster nodes are reachable: OK 09:28:30> Cluster nodes hard reset: OK 09:28:30> Cluster Join and auto-discovery test:

sunheehnus commented 9 years ago

Hi @renatoc , thanks very much for finding out this. Caused by

@@ -2422,8 +2423,8 @@ void clusterCommand(client *c) {
             "cluster_stats_messages_received:%lld\r\n"
             , statestr[server.cluster->state],
             dictSize(server.cluster->nodes),
-            server.cluster->size,
             server.cluster->reachable_nodes_count,
+            server.cluster->size,

My bad, did not make test after this change. >.< Not familiar with test code, try to fix this later...

antirez commented 9 years ago

np @sunheehnus, I'm fixing it right now :-)

sunheehnus commented 9 years ago

Hi @antirez , test/cluster/tests/includes/init-tests.tcl append -1 to line 53 can eliminate this error. But not sure if this is right. :-)

antirez commented 9 years ago

Yep that's how I fixed it :-)

-            [CI $id cluster_reachable_nodes] == [llength $ids]
+            [CI $id cluster_reachable_nodes]+1 == [llength $ids]

The interesting thing is that your fix about cluster_reachable_nodes also fixes a false positive issue we had in the test, because previous it was just reporting the cluster size when nodes were still not ready (no handshake performed). So now it's much better! Thanks ;-)

sunheehnus commented 9 years ago

My pleasure and great thanks to @renatoc . :-)