Netflix / dynomite

A generic dynamo implementation for different k-v storage engines
Apache License 2.0
4.19k stars 530 forks source link

redis lua script hangs infinitely #689

Open iTosun opened 5 years ago

iTosun commented 5 years ago

Hi,

First of all want to thank you all for this amazing project, really like dynomite! Right now i'm trying to do more advanced stuff with dynomite & redis and lua scripting. But i'm getting some weird results, a particular script is hanging infinitely till i exit it with ctrl+c. The config lua-time-limit is still default on 5000, so it should time-out after 5 seconds but it does not.

My cluster consists of 1 DC and 2 racks with each 1 node (simple dev setup). The config of a node looks like this:

dyn_o_mite:
  env: network
  listen: session:8102
  datacenter: dev
  rack: r1
  dyn_listen: session:8101
  dyn_port: 8101
  dyn_seed_provider: simple_provider
  dyn_seeds:
  - config_store:8101:r2:dev:2863311530
  servers:
  - 127.0.0.1:6379:1
  timeout: 150000
  tokens: 1431655765
  #secure_server_option: all
  stats_listen: 0.0.0.0:22222

The script that i try to run:

local ha = ARGV[2] --[[Some unused var to satisfy dynomite requirements--]]

local results = {{'Base:Config', redis.call("HGETALL", "Config:"..ARGV[1])}}

if redis.call("HEXISTS", "Mods:"..ARGV[1], 'req:headers') == 1 then
    table.insert(results, {'Request:Headers', redis.call("HGETALL", "Request:Headers:"..ARGV[1]) })
end

if redis.call("HEXISTS", "Mods:"..ARGV[1], 'req:body') == 1 then
    table.insert(results, {'Request:Body', redis.call("HGETALL", "Request:Body:"..ARGV[1]) })
end

if redis.call("HEXISTS", "Mods:"..ARGV[1], 'resp:headers') == 1 then
    table.insert(results, {'Response:Headers', redis.call("HGETALL", "Response:Headers:"..ARGV[1]) })
end

if redis.call("HEXISTS", "Mods:"..ARGV[1], 'resp:headers') == 1 then
    table.insert(results, {'Response:Headers', redis.call("HGETALL", "Response:Headers:"..ARGV[1]) })
end

if redis.call("HEXISTS", "Mods:"..ARGV[1], 'resp:body') == 1 then
    table.insert(results, {'Response:Body', redis.call("HGETALL", "Response:Body:"..ARGV[1]) })
end

return results

redis version: 5.0.4 dynomite version: 0.5.9

The weird thing is this script runs fine in redis it self.. Am i doing something wrong or is there something else going on?

iTosun commented 5 years ago

i've narrowed the problem down to the return section of the script. Looking at the logs the scripts seems to run fine until it answers back to dynomite.

session_store_1  | [2019-07-25 14:18:39.428] conn_recv_data:361 <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39318'> recv eof rb 126657 sb 4087
session_store_1  | [2019-07-25 14:18:39.428] req_recv_next:318 <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39318'> DONE
session_store_1  | [2019-07-25 14:18:39.428] core_close:414 close <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39318'> on event 00FF eof 1 done 1 rb 126657 sb 4087  
session_store_1  | [2019-07-25 14:18:39.428] event_del_conn:208 removing conn <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39318'> from active
session_store_1  | [2019-07-25 14:18:39.428] client_unref_internal_try_put:94 <CONN_CLIENT 0x560f36bffa80 -1 from '172.15.0.1:39318'> unref owner <POOL 0x560f35a8a700 'dyn_o_mite'>
config_store_1   | [2019-07-25 14:18:53.299] dnode_req_forward:274 <CONN_LOCAL_PEER_CLIENT 0x5575239da5a0 8 from '172.15.0.3:38586'> DNODE REQ RECEIVED dmsg->id 116646
config_store_1   | [2019-07-25 14:18:53.299] dnode_req_forward:281 <CONN_LOCAL_PEER_CLIENT 0x5575239da5a0 8 from '172.15.0.3:38586'> adding message 116000:0
config_store_1   | [2019-07-25 14:18:53.299] req_forward_local_datastore:533 <CONN_LOCAL_PEER_CLIENT 0x5575239da5a0 8 from '172.15.0.3:38586'> FORWARD <REQ 0x557524865cc0 116000:0::0 REQ_REDIS_EVALSHA, len:107> to storage conn <CONN_SERVER 0x5575239cdfc0 4 to '127.0.0.1:6379:1'>
config_store_1   | [2019-07-25 14:18:53.299] event_add_out:137 adding conn <CONN_SERVER 0x5575239cdfc0 4 to '127.0.0.1:6379:1'> to active
config_store_1   | [2019-07-25 14:18:53.299] event_del_out:165 removing conn <CONN_SERVER 0x5575239cdfc0 4 to '127.0.0.1:6379:1'> from active
config_store_1   | [2019-07-25 14:18:53.299] server_rsp_forward:781 <CONN_LOCAL_PEER_CLIENT 0x5575239da5a0 8 from '172.15.0.3:38586'> <REQ 0x557524865cc0 116000:0::0 REQ_REDIS_EVALSHA, len:107> RECEIVED <RSP 0x557524055180 116001:0 RSP_REDIS_MULTIBULK len:223>
config_store_1   | [2019-07-25 14:18:53.299] msg_local_one_rsp_handler:1059 612785360 SELECTED 604328336
config_store_1   | [2019-07-25 14:18:53.299] event_add_out:137 adding conn <CONN_LOCAL_PEER_CLIENT 0x5575239da5a0 8 from '172.15.0.3:38586'> to active
config_store_1   | [2019-07-25 14:18:53.299] dnode_rsp_send_done:510 <CONN_LOCAL_PEER_CLIENT 0x5575239da5a0 8 from '172.15.0.3:38586'> DNODE RSP SENT dmsg->id 116646
config_store_1   | [2019-07-25 14:18:53.299] event_del_out:165 removing conn <CONN_LOCAL_PEER_CLIENT 0x5575239da5a0 8 from '172.15.0.3:38586'> from active
session_store_1  | [2019-07-25 14:18:53.298] event_add_conn:188 adding conn <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39322'> to active
session_store_1  | [2019-07-25 14:18:53.298] proxy_accept:203 <CONN_PROXY 0x560f36c04be0 6 listening on 'session:8102'> accepted <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39322'>
session_store_1  | [2019-07-25 14:18:53.299] server_get_dc:568 server_get_dc dc 'dev'
session_store_1  | [2019-07-25 14:18:53.299] server_get_rack:600 server_get_rack   'r1'
session_store_1  | [2019-07-25 14:18:53.299] req_forward:869 >>>>>>>>>>>>>>>>>>>>>>> <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39322'> RECEIVED <REQ 0x560f37a256a0 116645:0::0 REQ_REDIS_EVALSHA, len:107> key '1' tagged key '1'
session_store_1  | [2019-07-25 14:18:53.299] server_get_dc:568 server_get_dc dc 'dev'
session_store_1  | [2019-07-25 14:18:53.299] server_get_rack:600 server_get_rack   'r1'
session_store_1  | [2019-07-25 14:18:53.299] req_forward_all_local_racks:702 <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39322'> <REQ 0x560f37a256a0 116645:0::0 REQ_REDIS_EVALSHA, len:107> same DC racks:2 expect replies 2
session_store_1  | [2019-07-25 14:18:53.299] req_forward_local_datastore:533 <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39322'> FORWARD <REQ 0x560f37a256a0 116645:0::0 REQ_REDIS_EVALSHA, len:107> to storage conn <CONN_SERVER 0x560f36c02fc0 5 to '127.0.0.1:6379:1'>
session_store_1  | [2019-07-25 14:18:53.299] event_add_out:137 adding conn <CONN_SERVER 0x560f36c02fc0 5 to '127.0.0.1:6379:1'> to active
session_store_1  | [2019-07-25 14:18:53.299] dnode_peer_req_forward:58 <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39322'> FORWARD <REQ 0x560f36c14de0 116646:116645::0 REQ_REDIS_EVALSHA, len:107> to peer <CONN_LOCAL_PEER_SERVER 0x560f36c106c0 9 to 'config_store:8101:r2:dev:2863311530'> on rack 'r2' dc 'dev' 
session_store_1  | [2019-07-25 14:18:53.299] event_add_out:137 adding conn <CONN_LOCAL_PEER_SERVER 0x560f36c106c0 9 to 'config_store:8101:r2:dev:2863311530'> to active
session_store_1  | [2019-07-25 14:18:53.299] event_del_out:165 removing conn <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39322'> from active
session_store_1  | [2019-07-25 14:18:53.299] event_del_out:165 removing conn <CONN_SERVER 0x560f36c02fc0 5 to '127.0.0.1:6379:1'> from active
session_store_1  | [2019-07-25 14:18:53.299] event_del_out:165 removing conn <CONN_LOCAL_PEER_SERVER 0x560f36c106c0 9 to 'config_store:8101:r2:dev:2863311530'> from active
session_store_1  | [2019-07-25 14:18:53.300] server_rsp_forward:781 <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39322'> <REQ 0x560f37a256a0 116645:0::0 REQ_REDIS_EVALSHA, len:107> RECEIVED <RSP 0x560f37a24bc0 116647:0 RSP_REDIS_MULTIBULK len:223>
session_store_1  | [2019-07-25 14:18:53.300] msg_local_one_rsp_handler:1059 933385904 SELECTED 933383120
session_store_1  | [2019-07-25 14:18:53.300] event_add_out:137 adding conn <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39322'> to active
session_store_1  | [2019-07-25 14:18:53.300] dnode_rsp_swallow:941 <CONN_LOCAL_PEER_SERVER 0x560f36c106c0 9 to 'config_store:8101:r2:dev:2863311530'> <REQ 0x560f36c14de0 116646:116645::0 REQ_REDIS_EVALSHA, len:107> SWALLOW <RSP 0x560f36cc2a20 116649:0 RSP_REDIS_MULTIBULK len:223> len 223
session_store_1  | [2019-07-25 14:18:53.300] event_del_out:165 removing conn <CONN_CLIENT 0x560f36bffa80 10 from '172.15.0.1:39322'> from active

watch for the RSP_REDIS_MULTIBULK and look at the next line, the client connections seems to be removed before the data is returned to the client

smukil commented 5 years ago

@iTosun I will debug with your script and get back with what I find.

iTosun commented 4 years ago

Any progress with debugging? I've also did some debugging, it seems that running the LUA script is no problem. The queries within the script are getting executed with no problem. Except the return trough the dynomite layer seems to be a problem if the response is too big.