basho / riak_kv

Riak Key/Value Store
Apache License 2.0
650 stars 233 forks source link

Binary keys and values cause issues in riak_kv_index_hashtree:maybe_throttle_build [JIRA: RIAK-2324] #854

Open wbrown opened 10 years ago

wbrown commented 10 years ago

Storing unencoded binary keys and values in Riak prerelease causes issues under load with riak_kv_index_hashtree:maybe_throttle_build:

2014-02-21 17:19:12.289 [error] <0.781.0> gen_server <0.781.0> terminated with reason: bad argument in call to erlang:byte_size({r_object,<<"IPDB">>,<<65,1,1,0>>,[{r_content,{dict,5,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],...},...},...}],...}) in riak_kv_index_hashtree:maybe_throttle_build/4 line 460
2014-02-21 17:19:12.289 [error] <0.781.0> CRASH REPORT Process <0.781.0> with 0 neighbours exited with reason: bad argument in call to erlang:byte_size({r_object,<<"IPDB">>,<<65,1,1,0>>,[{r_content,{dict,5,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],...},...},...}],...}) in riak_kv_index_hashtree:maybe_throttle_build/4 line 460 in gen_server:terminate/6 line 744
2014-02-21 17:19:12.289 [error] <0.773.0> Supervisor {<0.773.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.781.0> exit with reason bad argument in call to erlang:byte_size({r_object,<<"IPDB">>,<<65,1,1,0>>,[{r_content,{dict,5,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],...},...},...}],...}) in riak_kv_index_hashtree:maybe_throttle_build/4 line 460 in context child_terminated
2014-02-21 17:20:02.239 [error] emulator Error in process <0.21517.375> on node 'riak@altostratus.fabric' with exit value: {function_clause,[{yz_entropy,iterate_entropy_data,[<<3 bytes>>,[{continuation,none},{limit,100},{partition,6}],#Fun<yz_index_hashtree.5.14605823>,{error,{error,req_timedout}}],[{file,"src/yz_entropy.erl"},{line,44}]},{yz_index_hashtree,'-fold_keys/2-lc$^0/1-0-',3,[{...
vinoski commented 10 years ago

According to @wbrown this occurred with 2.0pre11.

rzezeski commented 10 years ago

I can't say if it's related (though my gut says no) but there is also a Yokozuna related error at the end. Something to investigate on my end.

wbrown commented 10 years ago

Not sure if it's Yokozuna related -- I disabled Yokozuna on the bucket in question, and I'm getting a similar error:

2014-03-03 18:52:35.405 [error] <0.1439.0> gen_server <0.1439.0> terminated with reason: bad argument in call to erlang:byte_size({r_object,<<"obs">>,<<"++7rH3IfSu+XKBTEb5F/co8CJDW304bTzP/NSbUCJl0=:0">>,[{r_content,{dict,5,16,...},...}],...}) in riak_kv_index_hashtree:maybe_throttle_build/4 line 460
2014-03-03 18:52:35.405 [error] <0.1439.0> CRASH REPORT Process <0.1439.0> with 0 neighbours exited with reason: bad argument in call to erlang:byte_size({r_object,<<"obs">>,<<"++7rH3IfSu+XKBTEb5F/co8CJDW304bTzP/NSbUCJl0=:0">>,[{r_content,{dict,5,16,...},...}],...}) in riak_kv_index_hashtree:maybe_throttle_build/4 line 460 in gen_server:terminate/6 line 744
2014-03-03 18:52:35.406 [error] <0.1434.0> Supervisor {<0.1434.0>,poolboy_sup} had child riak_core_vnode_worker started with {riak_core_vnode_worker,start_link,undefined} at <0.1439.0> exit with reason bad argument in call to erlang:byte_size({r_object,<<"obs">>,<<"++7rH3IfSu+XKBTEb5F/co8CJDW304bTzP/NSbUCJl0=:0">>,[{r_content,{dict,5,16,...},...}],...}) in riak_kv_index_hashtree:maybe_throttle_build/4 line 460 in context child_terminated
slfritchie commented 10 years ago

@wbrown Do you have entries in the crash.log that correspond to those 'bad argument'/'badarg' errors? That file should have a full stack trace (spread across multiple lines), which would make pinpointing the bad code path a lot easier.

wbrown commented 10 years ago

Is this what you need?

2014-03-03 18:52:35 =CRASH REPORT====
  crasher:
    initial call: riak_core_vnode_worker:init/1
    pid: <0.1439.0>
    registered_name: []
    exception exit: {{badarg,[{erlang,byte_size,[{r_object,<<"obs">>,<<"++7rH3IfSu+XKBTEb5F/co8CJDW304bTzP/NSbUCJl0=:0">>,[{r_content,{dict,5,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[[<<"dot">>|{<<110,34,76,102,83,20,248,80>>,{1,63561108748}}]],[],[],[],[],[],[[<<"content-type">>,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110],[<<"X-Riak-VTag">>,54,101,119,76,113,98,107,122,114,57,79,84,120,77,113,84,112,121,102,116,65,78]],[[<<"index">>,{<<"disc_i_int">>,0},{<<"e_md5_s_bin">>,<<"4d8f0d4ea04b84488883d338ea13fef3">>},{<<"e_name_s_bin">>,<<"997D3CF373AFA82224049C4C356851253B9A70B6">>},{<<"e_sha256_s_bin">>,<<"7065f1f53144cdde8caaf578aafd7110ef8c9142fa1dc0e6869269bdbd03ce31">>},{<<"etp_sid_i_bin">>,<<"2017911">>},{<<"etp_sid_i_bin">>,<<"2017912">>},{<<"etp_sid_i_bin">>,<<"2805709">>},{<<"n_dom_s_bin">>,<<"dc-b2b0e285.getapplicationmy.info">>},{<<"n_dom_type_s_bin">>,<<"A">>},{<<"n_ipv4_l_int">>,919197470},{<<"owner_s_bin">>,<<"A7Ifb3azxNWfjVRtShg99kWFPLxXFQKvHFgLIkgJRnA=">>},{<<"r_magic_s_bin">>,<<"PE32 executable for MS Windows (GUI) Intel 80386 32-bit">>},{<<"r_ts_i_int">>,1393881722},{<<"r_ver_i_int">>,0},{<<"src_s_bin">>,<<"Bvn4ba+aeAW7V1wukHqx7cMah9sFa+rqjah/BrrhQtM=">>},{<<"ts_i_int">>,1393889687},{<<"type_s_bin">>,<<"n_dom_lookup">>}]],[],[[<<"X-Riak-Last-Modified">>|{1393,889548,782994}]],[],[]}}},<<"{\"ts_i\": 1393889687, \"type_s\": \"n_dom_lookup\", \"src_s\": \"Bvn4ba+aeAW7V1wukHqx7cMah9sFa+rqjah/BrrhQtM=\", \"owner_s\": \"A7Ifb3azxNWfjVRtShg99kWFPLxXFQKvHFgLIkgJRnA=\", \"disc_i\": 0, \"r_ts_i\": 1393881722, \"e_md5_s\": \"4d8f0d4ea04b84488883d338ea13fef3\", \"etp_sid_I\": [\"2805709\", \"2017912\", \"2017911\"], \"e_sha256_s\": \"7065f1f53144cdde8caaf578aafd7110ef8c9142fa1dc0e6869269bdbd03ce31\", \"r_magic_s\": [\"PE32 executable for MS Windows (GUI) Intel 80386 32-bit\"], \"e_name_s\": [\"997D3CF373AFA82224049C4C356851253B9A70B6\"], \"n_dom_s\": \"dc-b2b0e285.getapplicationmy.info\", \"n_dom_type_s\": \"A\", \"n_IPv4_L\": 919197470, \"r_ver_i\": 0}">>}],[{<<110,34,76,102,83,20,248,80>>,{1,63561108748}}],{dict,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[[clean|true]],[]}}},undefined}],[]},{riak_kv_index_hashtree,maybe_throttle_build,4,[{file,"src/riak_kv_index_hashtree.erl"},{line,460}]},{eleveldb,fold_loop,4,[{file,"src/eleveldb.erl"},{line,347}]},{eleveldb,do_fold,4,[{file,"src/eleveldb.erl"},{line,334}]},{riak_kv_eleveldb_backend,'-fold_objects/4-fun-0-',4,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,477}]},{riak_kv_worker,handle_work,3,[{file,"src/riak_kv_worker.erl"},{line,46}]},{riak_core_vnode_worker,handle_cast,2,[{file,"src/riak_core_vnode_worker.erl"},{line,70}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,604}]}]},[{gen_server,terminate,6,[{file,"gen_server.erl"},{line,744}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
    ancestors: [<0.1434.0>,<0.1433.0>,<0.1432.0>,<0.1237.0>,riak_core_vnode_sup,riak_core_sup,<0.153.0>]
    messages: []
    links: [<0.1434.0>,<0.1433.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 2586
    stack_size: 27
    reductions: 773
  neighbours:
2014-03-03 18:52:35 =SUPERVISOR REPORT====
     Supervisor: {<0.1434.0>,poolboy_sup}
     Context:    child_terminated
     Reason:     {badarg,[{erlang,byte_size,[{r_object,<<"obs">>,<<"++7rH3IfSu+XKBTEb5F/co8CJDW304bTzP/NS
bUCJl0=:0">>,[{r_content,{dict,5,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[[<<"dot">>|{<<110,34,76,102,83,20,248,80>>,{1,63561108748}}]],[],[],[],[],[],[[<<"content-type">>,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110],[<<"X-Riak-VTag">>,54,101,119,76,113,98,107,122,114,57,79,84,120,77,113,84,112,121,102,116,65,78]],[[<<"index">>,{<<"disc_i_int">>,0},{<<"e_md5_s_bin">>,<<"4d8f0d4ea04b84488883d338ea13fef3">>},{<<"e_name_s_bin">>,<<"997D3CF373AFA82224049C4C356851253B9A70B6">>},{<<"e_sha256_s_bin">>,<<"7065f1f53144cdde8caaf578aafd7110ef8c9142fa1dc0e6869269bdbd03ce31">>},{<<"etp_sid_i_bin">>,<<"2017911">>},{<<"etp_sid_i_bin">>,<<"2017912">>},{<<"etp_sid_i_bin">>,<<"2805709">>},{<<"n_dom_s_bin">>,<<"dc-b2b0e285.getapplicationmy.info">>},{<<"n_dom_type_s_bin">>,<<"A">>},{<<"n_ipv4_l_int">>,919197470},{<<"owner_s_bin">>,<<"A7Ifb3azxNWfjVRtShg99kWFPLxXFQKvHFgLIkgJRnA=">>},{<<"r_magic_s_bin">>,<<"PE32 executable for MS Windows (GUI) Intel 80386 32-bit">>},{<<"r_ts_i_int">>,1393881722},{<<"r_ver_i_int">>,0},{<<"src_s_bin">>,<<"Bvn4ba+aeAW7V1wukHqx7cMah9sFa+rqjah/BrrhQtM=">>},{<<"ts_i_int">>,1393889687},{<<"type_s_bin">>,<<"n_dom_lookup">>}]],[],[[<<"X-Riak-Last-Modified">>|{1393,889548,782994}]],[],[]}}},<<"{\"ts_i\": 1393889687, \"type_s\": \"n_dom_lookup\", \"src_s\": \"Bvn4ba+aeAW7V1wukHqx7cMah9sFa+rqjah/BrrhQtM=\", \"owner_s\": \"A7Ifb3azxNWfjVRtShg99kWFPLxXFQKvHFgLIkgJRnA=\", \"disc_i\": 0, \"r_ts_i\": 1393881722, \"e_md5_s\": \"4d8f0d4ea04b84488883d338ea13fef3\", \"etp_sid_I\": [\"2805709\", \"2017912\", \"2017911\"], \"e_sha256_s\": \"7065f1f53144cdde8caaf578aafd7110ef8c9142fa1dc0e6869269bdbd03ce31\", \"r_magic_s\": [\"PE32 executable for MS Windows (GUI) Intel 80386 32-bit\"], \"e_name_s\": [\"997D3CF373AFA82224049C4C356851253B9A70B6\"], \"n_dom_s\": \"dc-b2b0e285.getapplicationmy.info\", \"n_dom_type_s\": \"A\", \"n_IPv4_L\": 919197470, \"r_ver_i\": 0}">>}],[{<<110,34,76,102,83,20,248,80>>,{1,63561108748}}],{dict,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[[clean|true]],[]}}},undefined}],[]},{riak_kv_index_hashtree,maybe_throttle_build,4,[{file,"src/riak_kv_index_hashtree.erl"},{line,460}]},{eleveldb,fold_loop,4,[{file,"src/eleveldb.erl"},{line,347}]},{eleveldb,do_fold,4,[{file,"src/eleveldb.erl"},{line,334}]},{riak_kv_eleveldb_backend,'-fold_objects/4-fun-0-',4,[{file,"src/riak_kv_eleveldb_backend.erl"},{line,477}]},{riak_kv_worker,handle_work,3,[{file,"src/riak_kv_worker.erl"},{line,46}]},{riak_core_vnode_worker,handle_cast,2,[{file,"src/riak_core_vnode_worker.erl"},{line,70}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,604}]}]}
     Offender:   [{pid,<0.1439.0>},{name,riak_core_vnode_worker},{mfargs,{riak_core_vnode_worker,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]
jonmeredith commented 10 years ago

@slfritchie looks like maybe_throttle_build is being passed a non-binary object to me? Perhaps here https://github.com/basho/riak_kv/blob/develop/src/riak_kv_index_hashtree.erl#L482

Maybe use erlang:external_size in `maybe_throttle_build``` if you can tolerate minor extra overhead for real binaries, or be more picky like https://github.com/basho/riak_kv/blob/develop/src/riak_kv_index_hashtree.erl#L422

slfritchie commented 10 years ago

@jonmeredith Yes, but then there's the question of why is a non-binary object occasionally given to a function that is supposed(*) to be handling binaries only?

(*) Or so says my bad memory. My head is still in Bitcask Land(™), so I haven't had time to review this bug further.

engelsanchez commented 10 years ago

Well I'll be... I've been reviewing the code and I can not see how a non-binary could be passed to this function. @jonmeredith hash_object is that way because different paths in riak_kv_vnode that call update_hashtree may have the binary or the object available (handoff vs. regular puts, etc). But this function should be used for a backend fold.

engelsanchez commented 10 years ago

@russelldb I noticed you took this item. During the triage with Kelly and Michelle, I believe we agreed to move this out of the milestone as not reproducible. If it pops up again in current code, we can open a new issue.

russelldb commented 10 years ago

So can we change the milestone, please?

engelsanchez commented 10 years ago

Moved from RC -> 2.0.1. @wbrown we are definitely interested in a reproducible case. Ping if you think you have one.

slfritchie commented 9 years ago

Is there anything new to report for this bug report?

xinuc commented 8 years ago

still experiencing this issue.

is there any quick fix?