edwardcapriolo / gossip

A Mavenized Apache V2 gossip implementation for Java
Apache License 2.0
160 stars 54 forks source link

Dead member detection #9

Closed hvandenb closed 8 years ago

hvandenb commented 8 years ago

I trying to tack down an issue with this. It seems when the services start they are able to see the remote peers. But then after a couple of seconds they stop seeing the, It seems that the main listener might be dead. Here is an example from my log. Any idea what might be causing this will help I'll do some tracing as well. However the trend is after a couple of seconds.

NOTE: I'm running this within Spring Boot.

We have 1 members that are live: [Member [address=02.edu:5555, id=-944842516, heartbeat=0]]
2015-12-06 17:58:29.258  INFO 21947 --- [        Cluster] o.m.cs565.dccs.cluster.ClusterManager    : We have 1 members that are live: [Member [address=02.edu:5555, id=-944842516, heartbeat=0]]
2015-12-06 17:58:29.294  INFO 21947 --- [        Timer-0] com.google.code.gossip.GossipService     : Dead member detected: Member [address=02.edu:5555, id=-944842516, heartbeat=0]
2015-12-06 17:58:29.295  INFO 21947 --- [        Timer-0] o.m.cs565.dccs.cluster.ClusterManager    : Gossip Event Member [address=02.edu:5555, id=-944842516, heartbeat=0], state [DOWN]
edwardcapriolo commented 8 years ago

See https://github.com/edwardcapriolo/gossip/issues/15 for a proposed fix!