Netflix / dynomite

A generic dynamo implementation for different k-v storage engines
Apache License 2.0
4.2k stars 534 forks source link

Dynomite cluster does not respond when peer node is stopped #631

Closed ghost closed 5 years ago

ghost commented 5 years ago

Hi I build the dynomite cluster in datacenter1 and in rack1 with 3 nodes . I got the issue peers not connected while I performing SET/GET operation, if one of the node is stopped . please give the solution How can I will make it work , eventhough one of the node is stopped

ghost commented 5 years ago

Please give the solution as early as possible

smukil commented 5 years ago

@chinmayvenkat Sorry for the slow response. Can you paste the actual error message? Also, can you paste your conf files?

leonardodavinte commented 5 years ago

Hello brother,

I'm having the same problem. If my node01 is out of service I get the error: NODE 01 Down "127.0.0.1:8102> get KEYTEST (error) ERR Peer: Peer Node is not connected "

But if my node02 and node03 are out of service, this does not impact the service.

NODE 02 or NODE 03 Down - OK 127.0.0.1:8102> get KEYTEST "TESTE01"

I'm still learning about Dynomite, but apparently Node01 is a kind of coordinator of the Rack cluster nodes.

It seems to me a single point of failure. Any node can be out of service, except on the node coordinator, in this case node01.

smukil commented 5 years ago

@leonardodavinte This is because you are specifically talking to Node 1 using the redis-cli. i.e. 127.0.0.1:8102 is your node 1's address:port. If you connect to Node 2 or Node 3's address:port, you will still be able to use the cluster.

If you want a client that can automatically detect node failures and fallback to a different node, you will need to use our Dyno Java client: https://github.com/Netflix/dyno

ghost commented 5 years ago

Hi which property we need to set for it. is dyno client supports single ton mapping . I am using pool. please give any examples. Thanks in advance

On Tue, Mar 26, 2019, 9:13 PM Sailesh Mukil notifications@github.com wrote:

@leonardodavinte https://github.com/leonardodavinte This is because you are specifically talking to Node 1 using the redis-cli. i.e. 127.0.0.1:8102 is your node 1's address:port. If you connect to Node 2 or Node 3's address:port, you will still be able to use the cluster.

If you want a client that can automatically detect node failures and fallback to a different node, you will need to use our Dyno Java client: https://github.com/Netflix/dyno

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Netflix/dynomite/issues/631#issuecomment-476710407, or mute the thread https://github.com/notifications/unsubscribe-auth/AO65gYDQ_ykE5Rr-O4DIsGDniFj29sfrks5vakAmgaJpZM4afbgY .

smukil commented 5 years ago

@chinmayvenkat A property for what specifically? There is a demo implementation for Dyno here. Hopefully that helps: https://github.com/Netflix/dyno/blob/master/dyno-demo/src/main/java/com/netflix/dyno/demo/redis/DynoJedisDemo.java

ghost commented 5 years ago

I implemented singleton implementation model for dyno client. if one of the node cluster was down. it is giving errors. if we re established the connection when the node get down. it is working fine. How can we handle this?

On Tue, Mar 26, 2019, 9:21 PM Sailesh Mukil notifications@github.com wrote:

@chinmayvenkat https://github.com/chinmayvenkat A property for what specifically? There is a demo implementation for Dyno here. Hopefully that helps:

https://github.com/Netflix/dyno/blob/master/dyno-demo/src/main/java/com/netflix/dyno/demo/redis/DynoJedisDemo.java

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Netflix/dynomite/issues/631#issuecomment-476714443, or mute the thread https://github.com/notifications/unsubscribe-auth/AO65gVRXF4taiaidzgQNlsSdoYeT5-c7ks5vakH8gaJpZM4afbgY .

leonardodavinte commented 5 years ago

@leonardodavinte This is because you are specifically talking to Node 1 using the redis-cli. i.e. 127.0.0.1:8102 is your node 1's address:port. If you connect to Node 2 or Node 3's address:port, you will still be able to use the cluster.

If you want a client that can automatically detect node failures and fallback to a different node, you will need to use our Dyno Java client: https://github.com/Netflix/dyno

Hello @smukil

Not exactly. I turned off node 01, and I try to connect to node 02 or node 03.

But with node 01 turned off, both node02 and node03 have errors: "127.0.0.1:8102> get KEYTEST (error) ERR Peer: Peer Node is not connected "

In this case, I'm connected via localhost on node02 and getting the error "Node is not connected".

Do you have any examples to configure 1 Datacenter 1 Rack 3 Nodes ?

Example: DC1 RACK1 node01 - 192.168.0.1 node02 - 192.168.0.2 node03 - 192.168.0.3

How would you do it?

Thank you so much.

smukil commented 5 years ago

@chinmayvenkat How many replicas do you have in your configuration?

smukil commented 5 years ago

@leonardodavinte Ah my bad, I misunderstood your previous post. So based on your configuration, you have only 1 replica for each key. And you have 3 shards.

So, this means that the key KEYTEST probably lives in the Node01 shard, and since that node is down, it cannot access that key. If you add 2 more racks, then you should be able to fallback to another replica even if Node01 is down in one rack.

leonardodavinte commented 5 years ago

@leonardodavinte Ah my bad, I misunderstood your previous post. So based on your configuration, you have only 1 replica for each key. And you have 3 shards.

So, this means that the key KEYTEST probably lives in the Node01 shard, and since that node is down, it cannot access that key. If you add 2 more racks, then you should be able to fallback to another replica even if Node01 is down in one rack.

@smukil Thanks for the explanation.

ghost commented 5 years ago

@smukil 3 replicas I have I had 3 racks consists of two servers in each rack read and write consistency levels are dc_quorum

ghost commented 5 years ago

@smukil can you please tell me. how to see how many read/requests coming to dynomite per second I used curl http://localhost:22222/info but I cannot understand how to see

smukil commented 5 years ago

@chinmayvenkat If you're using the Dyno client, then it should automatically fallback to a different replica. Have you made sure that your configuration is correct? You can check your topology with: http://localhost:22222/cluster_describe Also make sure that your HostSupplier has the right hosts: https://github.com/Netflix/dyno/wiki/Configuration#hostsupplier

http://localhost:22222/info does have the right information: ... "client_connections":1, "client_read_requests":0, "client_write_requests":4, ...

ghost commented 5 years ago

@smukil . yes dyno client is handling the things. but it is not worked when I implement it for dyno in singleton implementation. please go through https://github.com/Netflix/dyno/issues/258 @smukil please give response to the https://github.com/Netflix/dynomite/issues/644 as early as possible. it is painful point to me

smukil commented 5 years ago

Closing this, addressed the comment in: Netflix/dyno#258 and #644