Replication issue if a rack fails

dzhou121 commented 8 years ago

Hi,

I have 2 DCs and 2 Racks in each DC, and one node per rack. i.e.

cluster: datacenter: dc1 rack: rack1

cluster: datacenter: dc1 rack: rack2

cluster: datacenter: dc2 rack: rack1

cluster: datacenter: dc2 rack: rack2

If one node fails(which is equivalent to a rack fails), the replication ring seems to be broken, and sometimes the data is not replicated to the remote datacenter.

How can I remove the failed rack out of the replication ring without changing the config file and restart the whole cluster?

Thanks

ipapapa commented 8 years ago

Since you have a single node per rack, then each node handles the same token range. If a node in one rack fails, then that node will not be functional. It will still though remain in the .yml file. YAML is only read during the Dynomite startup. There is a REST call from Dynomite for cluster_describe to an external service (see Florida contributed by some OSS contributors or for a complete package Dynomite-manager), which can dynamically change the topology of Dynomite.

Nonetheless, the fact that one node is down in one rack, it does not mean that cross datacenter replication will fail. The other node in the other rack/dc which contains the same token range, and therefore the same day, will properly replicate data across datacenters.

dzhou121 commented 8 years ago

@ipapapa

Thanks for the detailed explanation.

I have just tested it again and I had the following issue:

rack2 in dc1 fails changes made on rack1 in dc1 is fine changes made on rack1 in dc2 is fine changes made on rack2 in dc2 can only replicated to rack1 in dc2, it's not replicated to rack1 in dc1.

ipapapa commented 8 years ago

How do you dynamically change the topology? Are you using a REST cluster_describe? As I mentioned above the YMAL is only ready during Dynomite's startup, after that the topology is maintained by Florida. Can you send your YMAL file?

dzhou121 commented 8 years ago

So do you mean that I need to manually remove the failed node from the cluster? I though it could replicate properly for the rest of racks even if one rack is down.

I don't change the topology at the moment. The YMAL is:

dyn_o_mite:
  datacenter: dc1
  rack: rack1
  dyn_listen: 0.0.0.0:8101
  data_store: 0
  listen: 0.0.0.0:8102
  dyn_seed_provider: simple_provider
  auto_eject_hosts: true
  server_retry_timeout: 30000
  server_failure_limit: 3
  dyn_seeds:
  - 185.40.140.166:8101:rack2:dc1:0
  - 172.31.11.240:8101:rack1:dc2:0
  - 172.31.24.52:8101:rack2:dc2:0
  servers:
  - 127.0.0.1:22122:1
  tokens: 0
  pem_key_file: /etc/dynomitedb/dynomite.pem

ipapapa commented 8 years ago

Correct if one node, or more, are down and the rack has more nodes, Dynomite would operate as normal. Each node is totally independent. That is the high availability. However, if a node fails, and you operate in a cloud deployment then you can use auto-scaling, the new node will be added to the ring, and the rest of the nodes will have to know about the new node hostname and token. These are obtained by Dynomite through a REST call to a sidecar for get_seeds().

ipapapa commented 8 years ago

Actually to debug your issue, I need to see all YMAL from all four nodes. Would you mind send them to us?

dzhou121 commented 8 years ago

dyn_o_mite:
  datacenter: dc1
  rack: rack1
  dyn_listen: 0.0.0.0:8101
  data_store: 0
  listen: 0.0.0.0:8102
  dyn_seed_provider: simple_provider
  auto_eject_hosts: true
  server_retry_timeout: 30000
  server_failure_limit: 3
  dyn_seeds:
  - 185.40.140.166:8101:rack2:dc1:0
  - 172.31.11.240:8101:rack1:dc2:0
  - 172.31.24.52:8101:rack2:dc2:0
  servers:
  - 127.0.0.1:22122:1
  tokens: 0
  pem_key_file: /etc/dynomitedb/dynomite.pem

dyn_o_mite:
  datacenter: dc1
  rack: rack2
  dyn_listen: 0.0.0.0:8101
  data_store: 0
  listen: 0.0.0.0:8102
  dyn_seed_provider: simple_provider
  auto_eject_hosts: true
  server_retry_timeout: 30000
  server_failure_limit: 3
  dyn_seeds:
  - 185.40.140.165:8101:rack1:dc1:0
  - 172.31.11.240:8101:rack1:dc2:0
  - 172.31.24.52:8101:rack2:dc2:0
  servers:
  - 127.0.0.1:22122:1
  tokens: 0
  pem_key_file: /etc/dynomitedb/dynomite.pem

dyn_o_mite:
  datacenter: dc2
  rack: rack1
  dyn_listen: 0.0.0.0:8101
  data_store: 0
  listen: 0.0.0.0:8102
  dyn_seed_provider: simple_provider
  auto_eject_hosts: true
  server_retry_timeout: 30000
  server_failure_limit: 3
  dyn_seeds:
  - 185.40.140.165:8101:rack1:dc1:0
  - 185.40.140.166:8101:rack2:dc1:0
  - 172.31.24.52:8101:rack2:dc2:0
  servers:
  - 127.0.0.1:22122:1
  tokens: 0
  pem_key_file: /etc/dynomitedb/dynomite.pem

dyn_o_mite:
  datacenter: dc2
  rack: rack2
  dyn_listen: 0.0.0.0:8101
  data_store: 0
  listen: 0.0.0.0:8102
  dyn_seed_provider: simple_provider
  auto_eject_hosts: true
  server_retry_timeout: 30000
  server_failure_limit: 3
  dyn_seeds:
  - 185.40.140.165:8101:rack1:dc1:0
  - 185.40.140.166:8101:rack2:dc1:0
  - 172.31.11.240:8101:rack1:dc2:0
  servers:
  - 127.0.0.1:22122:1
  tokens: 0
  pem_key_file: /etc/dynomitedb/dynomite.pem

ipapapa commented 8 years ago

I need to check in more detail, but is there a reason why you set tokens = 0, and not an actual number? Even in our basic provided dynomite.yml we provided a value !=0.

dzhou121 commented 8 years ago

It's a configuration error and I've changed all tokens to be 101134286 now. But this issue still exists.

richieyan commented 8 years ago

I don't think token zero is a problem. We can calculate the node index for 3 nodes within a rack as follows:

firstNodeToken = (4294967295 / 3) * 0 = 0
secondNodeToken = (4294967295 / 3) * 1 = 1431655765
thirsNodeToken = (4294967295 / 3) * 2 = 2863311530

This is from "http://www.dynomitedb.com/docs/dynomite/v0.5.8/topology/".

The problem is that use simple_provider won't update your dynomite topology when one node failed. You need change to florida_provider and deploy dynomite-manager or you own cluster manager service to support get_seeds() rest api which is used by florida_provider. Please notice, that you may need add compile option to add florida_provider to dynomite.

ipapapa commented 8 years ago

@dzhou121 has your question been answered?

dzhou121 commented 8 years ago

I'll try @richieyan 's suggestion out. Thank you both.

Netflix / dynomite

Replication issue if a rack fails #320