Netflix / dynomite

A generic dynamo implementation for different k-v storage engines
Apache License 2.0
4.2k stars 533 forks source link

Need help -- sharding not working #630

Closed 52tt closed 5 years ago

52tt commented 5 years ago

[ The title should be updated to "what is the hash algorithm for the key", however I didn't find a way to update it. Please directly go to the "Update" section. ]

Hi,

I have a cluster with 1 dc, 2 racks, 4 nodes, running on 2 physical boxes:

rack-2000:

rack-2001:

box0 IP address 172.16.105.213 box1 IP address 172.16.105.212

Here are the yml configuration files:

$ cat 172.16.105.213-2000.yml
dyn_o_mite:
    datacenter: localdc
    rack: rack-2000
    listen: 172.16.105.213:2000
    dyn_listen: 172.16.105.213:2100
    stats_listen: 172.16.105.213:2200
    tokens: 1
    timeout: 30000
    data_store: 0
    dyn_seed_provider: simple_provider
    dyn_seeds:
    - 172.16.105.212:2100:rack-2000:localdc:2
    - 172.16.105.212:2101:rack-2001:localdc:1
    - 172.16.105.213:2101:rack-2001:localdc:2
    servers:
    - 172.16.105.213:2300:1
$ 
$ cat 172.16.105.213-2001.yml
dyn_o_mite:
    datacenter: localdc
    rack: rack-2001
    listen: 172.16.105.213:2001
    dyn_listen: 172.16.105.213:2101
    stats_listen: 172.16.105.213:2201
    tokens: 2
    timeout: 30000
    data_store: 0
    dyn_seed_provider: simple_provider
    dyn_seeds:
    - 172.16.105.213:2100:rack-2000:localdc:1
    - 172.16.105.212:2100:rack-2000:localdc:2
    - 172.16.105.212:2101:rack-2001:localdc:1
    servers:
    - 172.16.105.213:2301:1
$
$ cat 172.16.105.212-2000.yml
dyn_o_mite:
    datacenter: localdc
    rack: rack-2000
    listen: 172.16.105.212:2000
    dyn_listen: 172.16.105.212:2100
    stats_listen: 172.16.105.212:2200
    tokens: 2
    timeout: 30000
    data_store: 0
    dyn_seed_provider: simple_provider
    dyn_seeds:
    - 172.16.105.213:2100:rack-2000:localdc:1
    - 172.16.105.212:2101:rack-2001:localdc:1
    - 172.16.105.213:2101:rack-2001:localdc:2
    servers:
    - 172.16.105.212:2300:1
$ 
$ cat 172.16.105.212-2001.yml
dyn_o_mite:
    datacenter: localdc
    rack: rack-2001
    listen: 172.16.105.212:2001
    dyn_listen: 172.16.105.212:2101
    stats_listen: 172.16.105.212:2201
    tokens: 1
    timeout: 30000
    data_store: 0
    dyn_seed_provider: simple_provider
    dyn_seeds:
    - 172.16.105.213:2100:rack-2000:localdc:1
    - 172.16.105.212:2100:rack-2000:localdc:2
    - 172.16.105.213:2101:rack-2001:localdc:2
    servers:
    - 172.16.105.212:2301:1

The problem with my setup is, looks like sharding is not working. Here is what I did:

Step 1) Set key-0001 to key-0019 to 172.16.105.213:2000; Set a-0001 to a-0019 to 172.16.105.213:2000; Set b-0001 to a-0019 to 172.16.105.213:2000; Set c-0001 to a-0019 to 172.16.105.213:2000; Set d-0001 to a-0019 to 172.16.105.213:2000;

for i in $(seq 1 19); do \
    CMD=$(printf 'redis-cli -h 172.16.105.213 -p 2000 set key-%04d hello\n' $i); \
    eval $CMD; \
done

Step 2) Get key from 172.16.105.213:2000 shows all keys are correctly set; Get key from 172.16.105.213:2001 shows all keys are correctly set; Get key from 172.16.105.212:2000 shows all keys are correctly set; Get key from 172.16.105.212:2001 shows all keys are correctly set;

Step 3) However, when scan keys from each redis instance, then I found: In rack-2000: All the 95 keys are stored in 172.16.105.213:2300, not sharded into 2 redis; In rack-2001: All the 95 keys are stored in 172.16.105.212:2301, not sharded into 2 redis;

$ redis-cli -h 172.16.105.213 -p 2300 --scan | wc -l
95
$ redis-cli -h 172.16.105.213 -p 2301 --scan | wc -l
0
$ redis-cli -h 172.16.105.212 -p 2300 --scan | wc -l
0
$ redis-cli -h 172.16.105.212 -p 2301 --scan | wc -l
95

So looks like sharding is not working. Or, maybe all the keys I used happens to fall into same slots? Then, what keys should I use to verify? I also tried other tokens. For example, rack-2000:

BTW, please help to clarify if my understanding of token is correct. I read the generate_yamls.py file to try to understand the token assignment. My understanding is token is used to divide the hash index range. But does that value really matter? For example, in a rack with 2 nodes setup, case a): assigning token 10 to node0, and 20 to node1; or case b): assigning token 100 to node0, and 200 to node1; or case c): assigning token 1000 to node0, and 2000 to node1, I slightly guess there is no difference among a), b) and c), I believe in either case, each node should hold half of the entire hash index space. Is my understanding correct? If not, then what's the maximal value for a token?

Thanks, Yun


Update on 1/25/2019:

1) After more research, I decided to try the numbers generated by the generate_yamls.py, i.e.:
rack-2000:
* node-2000@box0 assigned token 2147483647;
* node-2000@box1 assigned token 4294967295;
rack-2001:
* node-2001@box1 assigned token 4294967295;
* node-2001@box0 assigned token 2147483647;

Then, I found the sharding is working.
So I realized my understanding was incorrect.

2) So I also tried the following combinations, 
- in same rack: 1st node assigned 0x10000000, the 2nd node assigned 0x90000000;
- in same rack: 1st node assigned 0x20000000, the 2nd node assigned 0xA0000000;
- in same rack: 1st node assigned 0x70000000, the 2nd node assigned 0xF0000000;
- in same rack: 1st node assigned 0x80000000, the 2nd node assigned 0;
and I found in all cases above, the sharding works well.

----

So, my understanding is:
a) The token means the position in the ring of the entire uint32 value space;
b) In a 2-nodes system, if node-a is assigned token x, and node-b is assigned token y,
    then node-a holds slots [x, x+1, x+2, ..., y], and 
            node-b holds slots [y, y+1, ..., 0xFFFFFFFF, 0, ..., x].
Could you please confirm that?

----

Then, I have another question: What is the hash algorithm for the key?
Could you please let me know?
Thanks!
52tt commented 5 years ago

Already found answers to my questions. This ticket can be closed. Thanks.

Hints:

req_forward_local_dc  |
                      |--> dnode_peer_pool_server --> dnode_peer_for_key_on_rack --> dnode_peer_idx_for_key_on_rack
req_forward_remote_dc |
dicebattle commented 5 years ago

i think dynomite need explain more description about dynomite's hash algorithm.

some of dynomite peer finder such as dagota(https://github.com/Smile-SA/dagota ) implemented wrong method to create dynomite token.

anyway, thanks to your research. i confused same thing as you, and i found answer at this issue thread :)

ipapapa commented 5 years ago

Good point above, I have added more information in https://github.com/Netflix/dynomite/wiki/Replication