edwardcapriolo / gossip

A Mavenized Apache V2 gossip implementation for Java
Apache License 2.0
161 stars 54 forks source link

Dead Node #23

Open challengeteamttdh opened 8 years ago

challengeteamttdh commented 8 years ago

Hi,

I using this lib to intergrate in my application. I saw a issue. when I shutdown a node and turn on again, other node don't know this node. Example: I have 2 node 1,2. First I open all node. Second I turn off node 2 then turn on again but Node 2 don't know node 1 is UP and node 1 also don't know node 2 is UP. Please help me resolve this issue. Thank for your support.

edwardcapriolo commented 8 years ago

Questions 1) are you sure your system clock is in sync 2) how long was the node down for 3) Can you set the logging on both servers to debug. and record and relevant output? 4) what is your configuration? what are the inital contact points

We have a unit tests which does this with 5 nodes so it would be interesting to understand if the same logic does with two nodes

challengeteamttdh commented 8 years ago

Currently, I applying gossip for Spring boot application. Each node is a instance of Spring Boot. Let's me some advice for apply gossip to Spring Boot Application.

This is my configuration for Application 1 with port 8081. [{ "cluster":"", "id":"", "port":8081, "gossip_interval":1000, "cleanup_interval":10000, "members":[ {"cluster": "","id": "", "host":"192.168.1.90", "port":8084}, {"cluster": "","id": "", "host":"192.168.1.90", "port":8083}, {"cluster": "","id": "", "host":"192.168.1.90", "port":8082} ] }]

This is my configuration for Application 2 with port 8082. [{ "cluster":"", "id":"", "port":8082, "gossip_interval":1000, "cleanup_interval":10000, "members":[ {"cluster": "","id": "", "host":"192.168.1.90", "port":8084}, {"cluster": "","id": "", "host":"192.168.1.90", "port":8083}, {"cluster": "","id": "", "host":"192.168.1.90", "port":8081} ] }] Let's me know if I'm wrong.

edwardcapriolo commented 8 years ago

Each node needs an id. In your case you can generate a string or a uuid that will persist between restarts

challengeteamttdh commented 8 years ago

My application using

io.teknek gossip 0.0.3

Maybe it isn't generate id when use method public GossipService(StartupSettings startupSettings) throws InterruptedException, UnknownHostException { this(InetAddress.getLocalHost().getHostAddress(), startupSettings.getPort(), "", startupSettings.getGossipMembers(), startupSettings .getGossipSettings(), null); } is it right? 0.0.3 version is different to latest code on github. do you have any update version on maven?

edwardcapriolo commented 8 years ago

Yes. This looks like a bug of that version. The id was not required in original versions but now it is. Can you please try trunk version. I will release the current trunk later today.

challengeteamttdh commented 8 years ago

I reviewed code. I think on StartupSetting class. RemoteGossipMember member = new RemoteGossipMember(memberJSON.getString("cluster"), memberJSON.getString("host"), memberJSON.getInt("port"), ""); also need to generate ID for RemoteGossipMember. Currently, I changed latest code but It's still error. Please help me resolve this issuse. I hope that you have a release on today. Thank for your support in this issue.

edwardcapriolo commented 8 years ago

RemoteGossipMember member = new RemoteGossipMember(memberJSON.getString("cluster"), memberJSON.getString("host"), memberJSON.getInt("port"), "");

This code is ok. We would not know the remote id until will connect to that host.

Can you give a strip down example of your Spring boot example?

challengeteamttdh commented 8 years ago

When I use latest code. It's occur exception. I don't know why.

Exception in thread "pool-6-thread-1" java.lang.NullPointerException at com.google.code.gossip.mana ger.PassiveGossipThread.run(PassiveGossipThread.java:102) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722)

This is my configuration for gossip [{ "cluster":"1", "id":"1", "port":8081, "gossip_interval":1000, "cleanup_interval":10000, "members":[ {"cluster": "1","id": "4", "host":"192.168.1.90", "port":8084, "heartbeat":0}, {"cluster": "1","id": "3", "host":"192.168.1.90", "port":8083, "heartbeat":0}, {"cluster": "1","id": "2", "host":"192.168.1.90", "port":8082, "heartbeat":0} ] }]

challengeteamttdh commented 8 years ago

This is my sample code. Please help me review code. https://github.com/challengeteamttdh/springbootgossip Thanks. My application have a shedule run each 20s. It's print number base on number of node alive and position of node alive. when It's have a node DOWN or UP. others node need to know and update rule print number for system. Thank for your support very much.

edwardcapriolo commented 8 years ago
    if (memberJSONObject.length() == 5
                  && cluster.equals(memberJSONObject.get(GossipMember.JSON_CLUSTER))) {

This is a new piece of code. I will look at this.

edwardcapriolo commented 8 years ago

I found the bug you mentioned. The startup setting code was not setting the cluster name. I am looking at the unit test there because it is suspect. Sorry for the problems. Really cool app I want to take a deeper look at it. Please try the latest trunk again. SOrry for the issues, the cluster name is a new bit and I do not use the StartupSettings code path!

challengeteamttdh commented 8 years ago

I updated code it isn't occur Exeption. But when I start 2 instance of Spring Boot with port 8081 and 8082 corresponding to gossip.conf are:

We still don't know member node is UP. Firstly, I change port 8081 in application.properties and use gossip.conf for 8081 and start spring boot. Secondly, I change port 8082 in application.properties and use gossip.conf for 8081 and start spring boot. However, We do not know each other UP or DOWN. Let's take look at this. I really love your gossip code to integrate to my application. Please spend time help me resolve this issue.

edwardcapriolo commented 8 years ago

Great. Keep in mind the getMemberList does not include yourself, so in a two node cluster each node has me + getMemberList() = 1

challengeteamttdh commented 8 years ago

What Do You Mean ? Am I implementing incorrect? So What I need to do to fix this?

edwardcapriolo commented 8 years ago

The only thing I am saying is. The member does not include the local member. The local member is assumed.

challengeteamttdh commented 8 years ago

Do you have any ideal for my application?. I don't know how to apply gossip to my application. How to a instance of spring boot know to other instance of spring boot.

edwardcapriolo commented 8 years ago

How do you start two compies of the application?

mvn spring-boot:run -Drun.jvmArguments='-Dserver.port=8081' mvn spring-boot:run -Drun.jvmArguments='-Dserver.port=8082'

Whehn i do this they take the same config

challengeteamttdh commented 8 years ago

You need change port gossip.conf like port instance of Spring Boot. This is gossip.conf for port 8081: [{ "cluster":"1", "id":"1", "port":8081, "gossip_interval":1000, "cleanup_interval":10000, "members":[ {"cluster": "1","id": "4", "host":"192.168.1.90", "port":8084, "heartbeat":0}, {"cluster": "1","id": "3", "host":"192.168.1.90", "port":8083, "heartbeat":0}, {"cluster": "1","id": "2", "host":"192.168.1.90", "port":8082, "heartbeat":0} ] }] Then run mvn spring-boot:run -Drun.jvmArguments='-Dserver.port=8081'

This is gossip.conf for port 8082: [{ "cluster":"2", "id":"2", "port":8082, "gossip_interval":1000, "cleanup_interval":10000, "members":[ {"cluster": "2","id": "4", "host":"192.168.1.90", "port":8084, "heartbeat":0}, {"cluster": "2","id": "3", "host":"192.168.1.90", "port":8083, "heartbeat":0}, {"cluster": "2","id": "1", "host":"192.168.1.90", "port":8081, "heartbeat":0} ] }] Then run mvn spring-boot:run -Drun.jvmArguments='-Dserver.port=8082'

is it necessary run same gossip.conf?