Closed chbndrhnns closed 8 years ago
Is this random or does it happen every time. We should be able to provide a unit test that demonstrates this.
It happens every time to me. I created a test case (inherited from the existing tests) that demonstrates the issue at http://pastebin.com/nKHDHj7L
Awesome! Nice use of tunit :) I will take a look at this. I feel like something similar has been reported before.
Sorry for the delay. I was looking at this. I changed your test a bit. The issue is that higher heartbeat messages win often and nodes ping pong state. So the answer here is to implement some type of hold down timer such that nodes that are recently revived are not so eagerly marked dead again. I will look at that later tonight.
I have tuned up the code a bit, I think I am going to turn hearbeat into a timestamp so that it is easier to establish cronology. Right now when a node starts up again its heartbeat is 0 and its gossip never "wins" over older recrods.
@chbndrhnns I noticed your test had one minor bug in that your were removing adding a new gossiper to the list and not removing the old one. Still there were other issues that I addressed. Can you give https://github.com/edwardcapriolo/gossip/compare/ts_as_heartbeat?expand=1 a try and let me know your experience with it. If i dont hear from you in a few days I will merge
Hey, thanks very much for your work! While the new tests always pass, the timestamp heartbeat system does not work in my real life application. Maybe I can provide you with a test set that fails.
Hey, when I ran tests with multiple clients on my local machine I discovered this behavior:
Did someone observe a similar behavior and has a fix to it?