Netflix / dynomite

A generic dynamo implementation for different k-v storage engines
Apache License 2.0
4.2k stars 534 forks source link

Dynomite Assert while trying redis-benchmark with default settings #636

Closed ping2balaji closed 5 years ago

ping2balaji commented 5 years ago

Hi,

I bought Dynomite cluster with 2 racks each in 2 datacenters.

configuration file: dyn_o_mite: datacenter: dc-rdb1 rack: rack1 dyn_listen: 172.16.0.15:8101 listen: 172.16.0.15:8102 servers: 172.16.0.15:6379:1 dyn_seeds: 172.16.0.15:8111:rack2:dc-rdb1:0 172.16.0.8:8101:rack1:dc-rdb2:0 172.16.0.8:8111:rack2:dc-rdb2:0 tokens: '0' secure_server_option: datacenter pem_key_file: conf/dynomite.pem data_store: 0 stats_listen: 127.0.0.1:22222 read_consistency : DC_ONE # DC_QUORUM for majority replica node to agree write_consistency : DC_ONE # DC_QUORUM for majority replica node to agree

Then tried to benchmark using "./redis-benchmark -h 172.16.0.15 -p 8102 -d 100" cli and got following crash:

[2019-02-18 07:22:53.827] msg_send:1191 <CONN_REMOTE_PEER_SERVER 0x1bb4e80 19 to '172.16.0.8:8101:rack1:dc-rdb2:0'> Setting ENOTRECOVERABLE happens here![49/947] [2019-02-18 07:22:53.827] core_core:540 conn err on dnode EVENT_WRITE: 131 [2019-02-18 07:22:53.827] core_close:414 close <CONN_REMOTE_PEER_SERVER 0x1bb4e80 19 to '172.16.0.8:8101:rack1:dc-rdb2:0'> on event FF00 eof 0 done 0 rb 2744514 sb 14475753: State not recoverable [2019-02-18 07:22:53.839] dnode_peer_close:420 <CONN_REMOTE_PEER_SERVER 0x1bb4e80 19 to '172.16.0.8:8101:rack1:dc-rdb2:0'> Closing, Dropped 20044 outqueue & 1021 9 inqueue requests [2019-02-18 07:22:53.839] event_del_conn:211 epoll ctl on e 10 sd 19 failed: No such file or directory [2019-02-18 07:22:53.839] dnode_peer_unref:55 Marking <NODE 0x1b993d0 172.16.0.8 dc-rdb2 rack1 secured:1> as down [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874296 [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874301 [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874306 [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874311 [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874316 [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874321 [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874326 [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874331 [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874336 [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874341 [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874346 [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874351 [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874356 [2019-02-18 07:22:53.839] client_handle_response:240 looks like we already cleanedup the request for 874361 [2019-02-18 07:22:53.839] dn_assert:301 assert 'req->is_request' failed @ (dyn_server.c, 727) [2019-02-18 07:22:53.839] dn_assert:301 assert 'req->is_request' failed @ (dyn_server.c, 771) [2019-02-18 07:22:53.839] dn_assert:301 assert 'req->is_request' failed @ (dyn_server.c, 948) [2019-02-18 07:22:53.839] dn_assert:301 assert '(c_conn->type == CONN_CLIENT) || (c_conn->type == CONN_DNODE_PEER_CLIENT)' failed @ (dyn_server.c, 784) [2019-02-18 07:22:53.840] msg_local_one_rsp_handler:1052 ASSERTION FAILED: Received more than one response for dc_one. <REQ 0x42cdd900 874531:0::0 REQ_REDIS_SET, len:144> prev <RSP 0x42c5bf00 874534:0 RSP_REDIS_ERROR len:39> new rsp <RSP 0x42441760 874566:0 RSP_REDIS_STATUS len:5> [2019-02-18 07:22:53.840] dn_assert:301 assert '!req->selected_rsp' failed @ (dyn_client.c, 1052) [2019-02-18 07:22:53.847] dn_stacktrace:286 [0] /lib64/libpthread.so.0(+0xf6d0) [0x7f343d67b6d0] ??:0 [2019-02-18 07:22:53.865] dn_stacktrace:286 [1] ./src/dynomite() [0x42639a] /root/dynomite/src/dyn_server.c:953 (discriminator 1) [2019-02-18 07:22:53.874] dn_stacktrace:286 [2] ./src/dynomite() [0x426c59] /root/dynomite/src/dyn_server.c:780 [2019-02-18 07:22:53.884] dn_stacktrace:286 [3] ./src/dynomite(msg_recv+0x456) [0x422d36] /root/dynomite/src/dyn_message.c:813 [2019-02-18 07:22:53.894] dn_stacktrace:286 [4] ./src/dynomite(core_core+0xcf) [0x41648f] /root/dynomite/src/dyn_core.c:385 [2019-02-18 07:22:53.903] dn_stacktrace:286 [5] ./src/dynomite(event_wait+0x164) [0x43fe94] /root/dynomite/src/event/dyn_epoll.c:254 [2019-02-18 07:22:53.912] dn_stacktrace:286 [6] ./src/dynomite(core_loop+0x16a) [0x41740a] /root/dynomite/src/dyn_core.c:647 (discriminator 3) [2019-02-18 07:22:53.923] dn_stacktrace:286 [7] ./src/dynomite(main+0x6a0) [0x40cea0]

Can someone pls check?

Note: i have disabled the secure_server_option later but still got assert as below:

[2019-02-18

07:37:29.689] msg_local_one_rsp_handler:1052 ASSERTION FAILED: Received more than one response for dc_one. <REQ 0x33777450 1177795:0::$ REQ_REDIS_INCR, len:41> prev <RSP 0x33c7d6e0 1177798:0 RSP_REDIS_ERROR len:39> new rsp <RSP 0x33c647e0 1177895:0 RSP_REDIS_INTEGER len:4> [2019-02-18 07:37:29.689] dn_assert:301 assert '!req->selected_rsp' failed @ (dyn_client.c, 1052) [2019-02-18 07:37:29.689] dn_stacktrace:286 [0] /lib64/libpthread.so.0(+0xf6d0) [0x7f230aea36d0] ??:0 [2019-02-18 07:37:29.697] dn_stacktrace:286 [1] ./src/dynomite() [0x42639a] /root/dynomite/src/dyn_server.c:953 (discriminator 1) [2019-02-18 07:37:29.704] dn_stacktrace:286 [2] ./src/dynomite() [0x426c59] /root/dynomite/src/dyn_server.c:780 [2019-02-18 07:37:29.712] dn_stacktrace:286 [3] ./src/dynomite(msg_recv+0x456) [0x422d36] /root/dynomite/src/dyn_message.c:813 [2019-02-18 07:37:29.720] dn_stacktrace:286 [4] ./src/dynomite(core_core+0xcf) [0x41648f] /root/dynomite/src/dyn_core.c:385 [2019-02-18 07:37:29.727] dn_stacktrace:286 [5] ./src/dynomite(event_wait+0x164) [0x43fe94] /root/dynomite/src/event/dyn_epoll.c:254 [2019-02-18 07:37:29.735] dn_stacktrace:286 [6] ./src/dynomite(core_loop+0x16a) [0x41740a] /root/dynomite/src/dyn_core.c:647 (discriminator 3) [2019-02-18 07:37:29.742] dn_stacktrace:286 [7] ./src/dynomite(main+0x6a0) [0x40cea0] /root/dynomite/src/dynomite.c:565 [2019-02-18 07:37:29.751] dn_stacktrace:286 [8] /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f230a7e7445] ??:0 [2019-02-18 07:37:29.758] dn_stacktrace:286 [9] ./src/dynomite() [0x40d2c8] ??:? [2019-02-18 07:37:29.766] signal_handler:123 signal 11 (SIGSEGV) received, core dumping Segmentation fault (core dumped) .

ping2balaji commented 5 years ago

Can anyone pls help me with some pointers about this issue?

smukil commented 5 years ago

@ping2balaji Sorry for the slow reply. I hadn't noticed this issue. What version of Dynomite are you running? This seems like an old bug that has already been fixed.

smukil commented 5 years ago

Closing since it points to an already fixed issue. Feel free to reopen if you have further queries.