basho / riak

Riak is a decentralized datastore from Basho Technologies.
http://docs.basho.com
Apache License 2.0
3.94k stars 536 forks source link

riak-admin test fails with: Failed to write test value: {error,timeout} [JIRA: RIAK-2645] #848

Open ubeatha opened 8 years ago

ubeatha commented 8 years ago

riak-admin test fails while riak ping succeeds. The cluster looks to be operational with the exception of riak-admin test. When the failing riak-admin test is running the OS slows and other network applications slow.

Versions: riak-2.1.1-1.el6.x86_64 and riak-2.1.4-1.el6.x86_64 OS: CentOS release 6.6 (Final)

To reproduce: join two nodes to form a cluster: riak-admin cluster join riak@192.168.0.1 riak-admin cluster plan riak-admin cluster commit

Test all nodes with riak-admin test: riak-admin test Failed to write test value: {error,timeout}

Node will still respond to ping: riak ping pong

As suggested on several mailing list postings, protobuf.backlog was set to 256. Limits were checked.

Both servers respond to riak-admin test correctly before the cluster is created. All other checks look fine.

Output of riak-admin ringready riak-admin ringready TRUE All nodes agree on the ring ['riak@192.168.0.1','riak@192.168.0.2']

Output of riak-admin member-status riak-admin member-status ================================= Membership ==================================

Status Ring Pending Node

valid 50.0% -- 'riak@192.168.0.1'

valid 50.0% -- 'riak@192.168.0.2'

Valid:2 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Output of riak-admin ring-status: riak-admin ring-status ================================== Claimant =================================== Claimant: 'riak@192.168.0.1' Status: up Ring Ready: true

============================== Ownership Handoff ============================== No pending changes.

============================== Unreachable Nodes ============================== All nodes are up and reachable

Output of riak-admin transfers: riak-admin transfers No transfers active

Active Transfers:

Output of riak-admin diag: riak-admin diag [warning] The following preflists do not satisfy the n_val: [[{22835963083295358096932575511191922182123945984, 'riak@192.168.0.2'}, {45671926166590716193865151022383844364247891968, 'riak@192.168.0.2'}], [{68507889249886074290797726533575766546371837952, 'riak@192.168.0.1'}, {91343852333181432387730302044767688728495783936, 'riak@192.168.0.1'}], [{114179815416476790484662877555959610910619729920, 'riak@192.168.0.2'}, {137015778499772148581595453067151533092743675904, 'riak@192.168.0.2'}], [{159851741583067506678528028578343455274867621888, 'riak@192.168.0.1'}, {182687704666362864775460604089535377456991567872, 'riak@192.168.0.1'}], [{205523667749658222872393179600727299639115513856, 'riak@192.168.0.2'}, {228359630832953580969325755111919221821239459840, 'riak@192.168.0.2'}], [{251195593916248939066258330623111144003363405824, 'riak@192.168.0.1'}, {274031556999544297163190906134303066185487351808, 'riak@192.168.0.1'}], [{296867520082839655260123481645494988367611297792, 'riak@192.168.0.2'}, {319703483166135013357056057156686910549735243776, 'riak@192.168.0.2'}], [{342539446249430371453988632667878832731859189760, 'riak@192.168.0.1'}, {365375409332725729550921208179070754913983135744, 'riak@192.168.0.1'}], [{388211372416021087647853783690262677096107081728, 'riak@192.168.0.2'}, {411047335499316445744786359201454599278231027712, 'riak@192.168.0.2'}], [{433883298582611803841718934712646521460354973696, 'riak@192.168.0.1'}, {456719261665907161938651510223838443642478919680, 'riak@192.168.0.1'}], [{479555224749202520035584085735030365824602865664, 'riak@192.168.0.2'}, {502391187832497878132516661246222288006726811648, 'riak@192.168.0.2'}], [{525227150915793236229449236757414210188850757632, 'riak@192.168.0.1'}, {548063113999088594326381812268606132370974703616, 'riak@192.168.0.1'}], [{570899077082383952423314387779798054553098649600, 'riak@192.168.0.2'}, {593735040165679310520246963290989976735222595584, 'riak@192.168.0.2'}], [{616571003248974668617179538802181898917346541568, 'riak@192.168.0.1'}, {639406966332270026714112114313373821099470487552, 'riak@192.168.0.1'}], [{662242929415565384811044689824565743281594433536, 'riak@192.168.0.2'}, {685078892498860742907977265335757665463718379520, 'riak@192.168.0.2'}], [{707914855582156101004909840846949587645842325504, 'riak@192.168.0.1'}, {730750818665451459101842416358141509827966271488, 'riak@192.168.0.1'}], [{753586781748746817198774991869333432010090217472, 'riak@192.168.0.2'}, {776422744832042175295707567380525354192214163456, 'riak@192.168.0.2'}], [{799258707915337533392640142891717276374338109440, 'riak@192.168.0.1'}, {822094670998632891489572718402909198556462055424, 'riak@192.168.0.1'}], [{844930634081928249586505293914101120738586001408, 'riak@192.168.0.2'}, {867766597165223607683437869425293042920709947392, 'riak@192.168.0.2'}], [{890602560248518965780370444936484965102833893376, 'riak@192.168.0.1'}, {913438523331814323877303020447676887284957839360, 'riak@192.168.0.1'}], [{936274486415109681974235595958868809467081785344, 'riak@192.168.0.2'}, {959110449498405040071168171470060731649205731328, 'riak@192.168.0.2'}], [{981946412581700398168100746981252653831329677312, 'riak@192.168.0.1'}, {1004782375664995756265033322492444576013453623296, 'riak@192.168.0.1'}], [{1027618338748291114361965898003636498195577569280, 'riak@192.168.0.2'}, {1050454301831586472458898473514828420377701515264, 'riak@192.168.0.2'}], [{1073290264914881830555831049026020342559825461248, 'riak@192.168.0.1'}, {1096126227998177188652763624537212264741949407232, 'riak@192.168.0.1'}], [{1118962191081472546749696200048404186924073353216, 'riak@192.168.0.2'}, {1141798154164767904846628775559596109106197299200, 'riak@192.168.0.2'}], [{1164634117248063262943561351070788031288321245184, 'riak@192.168.0.1'}, {1187470080331358621040493926581979953470445191168, 'riak@192.168.0.1'}], [{1210306043414653979137426502093171875652569137152, 'riak@192.168.0.2'}, {1233142006497949337234359077604363797834693083136, 'riak@192.168.0.2'}], [{1255977969581244695331291653115555720016817029120, 'riak@192.168.0.1'}, {1278813932664540053428224228626747642198940975104, 'riak@192.168.0.1'}], [{1301649895747835411525156804137939564381064921088, 'riak@192.168.0.2'}, {1324485858831130769622089379649131486563188867072, 'riak@192.168.0.2'}], [{1347321821914426127719021955160323408745312813056, 'riak@192.168.0.1'}, {1370157784997721485815954530671515330927436759040, 'riak@192.168.0.1'}], [{1392993748081016843912887106182707253109560705024, 'riak@192.168.0.2'}, {1415829711164312202009819681693899175291684651008, 'riak@192.168.0.2'}], [{1438665674247607560106752257205091097473808596992, 'riak@192.168.0.1'}, {0, 'riak@192.168.0.1'}]] [notice] Data directory /var/lib/riak/leveldb is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve performance.

Output of riak-admin status: riak-admin status

1-minute stats for 'riak@192.168.0.1'

connected_nodes : ['riak@192.168.0.2'] consistent_get_objsize_100 : 0 consistent_get_objsize_95 : 0 consistent_get_objsize_99 : 0 consistent_get_objsize_mean : 0 consistent_get_objsize_median : 0 consistent_get_time_100 : 0 consistent_get_time_95 : 0 consistent_get_time_99 : 0 consistent_get_time_mean : 0 consistent_get_time_median : 0 consistent_gets : 0 consistent_gets_total : 0 consistent_put_objsize_100 : 0 consistent_put_objsize_95 : 0 consistent_put_objsize_99 : 0 consistent_put_objsize_mean : 0 consistent_put_objsize_median : 0 consistent_put_time_100 : 0 consistent_put_time_95 : 0 consistent_put_time_99 : 0 consistent_put_time_mean : 0 consistent_put_time_median : 0 consistent_puts : 0 consistent_puts_total : 0 converge_delay_last : 14899 converge_delay_max : 0 converge_delay_mean : 0 converge_delay_min : 0 coord_redirs_total : 2 counter_actor_counts_100 : 0 counter_actor_counts_95 : 0 counter_actor_counts_99 : 0 counter_actor_counts_mean : 0 counter_actor_counts_median : 0 cpu_avg1 : 0 cpu_avg15 : 0 cpu_avg5 : 0 cpu_nprocs : 261 dropped_vnode_requests_total : 0 executing_mappers : 0 gossip_received : 7 handoff_timeouts : 0 ignored_gossip_total : 0 index_fsm_active : 0 index_fsm_create : 0 index_fsm_create_error : 0 late_put_fsm_coordinator_ack : 0 leveldb_read_block_error : 0 list_fsm_active : 0 list_fsm_create : 0 list_fsm_create_error : 0 list_fsm_create_error_total : 0 list_fsm_create_total : 0 map_actor_counts_100 : 0 map_actor_counts_95 : 0 map_actor_counts_99 : 0 map_actor_counts_mean : 0 map_actor_counts_median : 0 mem_allocated : 405155840 mem_total : 6137098240 memory_atom : 594537 memory_atom_used : 563928 memory_binary : 1500704 memory_code : 14313177 memory_ets : 6581160 memory_processes : 46084360 memory_processes_used : 46084248 memory_system : 27707576 memory_total : 73791936 node_get_fsm_active : 0 node_get_fsm_active_60s : 0 node_get_fsm_counter_objsize_100 : 0 node_get_fsm_counter_objsize_95 : 0 node_get_fsm_counter_objsize_99 : 0 node_get_fsm_counter_objsize_mean : 0 node_get_fsm_counter_objsize_median : 0 node_get_fsm_counter_siblings_100 : 0 node_get_fsm_counter_siblings_95 : 0 node_get_fsm_counter_siblings_99 : 0 node_get_fsm_counter_siblings_mean : 0 node_get_fsm_counter_siblings_median : 0 node_get_fsm_counter_time_100 : 0 node_get_fsm_counter_time_95 : 0 node_get_fsm_counter_time_99 : 0 node_get_fsm_counter_time_mean : 0 node_get_fsm_counter_time_median : 0 node_get_fsm_errors : 0 node_get_fsm_errors_total : 0 node_get_fsm_in_rate : 0 node_get_fsm_map_objsize_100 : 0 node_get_fsm_map_objsize_95 : 0 node_get_fsm_map_objsize_99 : 0 node_get_fsm_map_objsize_mean : 0 node_get_fsm_map_objsize_median : 0 node_get_fsm_map_siblings_100 : 0 node_get_fsm_map_siblings_95 : 0 node_get_fsm_map_siblings_99 : 0 node_get_fsm_map_siblings_mean : 0 node_get_fsm_map_siblings_median : 0 node_get_fsm_map_time_100 : 0 node_get_fsm_map_time_95 : 0 node_get_fsm_map_time_99 : 0 node_get_fsm_map_time_mean : 0 node_get_fsm_map_time_median : 0 node_get_fsm_objsize_100 : 0 node_get_fsm_objsize_95 : 0 node_get_fsm_objsize_99 : 0 node_get_fsm_objsize_mean : 0 node_get_fsm_objsize_median : 0 node_get_fsm_out_rate : 0 node_get_fsm_rejected : 0 node_get_fsm_rejected_60s : 0 node_get_fsm_rejected_total : 0 node_get_fsm_set_objsize_100 : 0 node_get_fsm_set_objsize_95 : 0 node_get_fsm_set_objsize_99 : 0 node_get_fsm_set_objsize_mean : 0 node_get_fsm_set_objsize_median : 0 node_get_fsm_set_siblings_100 : 0 node_get_fsm_set_siblings_95 : 0 node_get_fsm_set_siblings_99 : 0 node_get_fsm_set_siblings_mean : 0 node_get_fsm_set_siblings_median : 0 node_get_fsm_set_time_100 : 0 node_get_fsm_set_time_95 : 0 node_get_fsm_set_time_99 : 0 node_get_fsm_set_time_mean : 0 node_get_fsm_set_time_median : 0 node_get_fsm_siblings_100 : 0 node_get_fsm_siblings_95 : 0 node_get_fsm_siblings_99 : 0 node_get_fsm_siblings_mean : 0 node_get_fsm_siblings_median : 0 node_get_fsm_time_100 : 0 node_get_fsm_time_95 : 0 node_get_fsm_time_99 : 0 node_get_fsm_time_mean : 0 node_get_fsm_time_median : 0 node_gets : 0 node_gets_counter : 0 node_gets_counter_total : 0 node_gets_map : 0 node_gets_map_total : 0 node_gets_set : 0 node_gets_set_total : 0 node_gets_total : 4 node_put_fsm_active : 0 node_put_fsm_active_60s : 0 node_put_fsm_counter_time_100 : 0 node_put_fsm_counter_time_95 : 0 node_put_fsm_counter_time_99 : 0 node_put_fsm_counter_time_mean : 0 node_put_fsm_counter_time_median : 0 node_put_fsm_in_rate : 0 node_put_fsm_map_time_100 : 0 node_put_fsm_map_time_95 : 0 node_put_fsm_map_time_99 : 0 node_put_fsm_map_time_mean : 0 node_put_fsm_map_time_median : 0 node_put_fsm_out_rate : 0 node_put_fsm_rejected : 0 node_put_fsm_rejected_60s : 0 node_put_fsm_rejected_total : 0 node_put_fsm_set_time_100 : 0 node_put_fsm_set_time_95 : 0 node_put_fsm_set_time_99 : 0 node_put_fsm_set_time_mean : 0 node_put_fsm_set_time_median : 0 node_put_fsm_time_100 : 0 node_put_fsm_time_95 : 0 node_put_fsm_time_99 : 0 node_put_fsm_time_mean : 0 node_put_fsm_time_median : 0 node_puts : 0 node_puts_counter : 0 node_puts_counter_total : 0 node_puts_map : 0 node_puts_map_total : 0 node_puts_set : 0 node_puts_set_total : 0 node_puts_total : 1 nodename : 'riak@192.168.0.1' object_counter_merge : 0 object_counter_merge_time_100 : 0 object_counter_merge_time_95 : 0 object_counter_merge_time_99 : 0 object_counter_merge_time_mean : 0 object_counter_merge_time_median : 0 object_counter_merge_total : 0 object_map_merge : 0 object_map_merge_time_100 : 0 object_map_merge_time_95 : 0 object_map_merge_time_99 : 0 object_map_merge_time_mean : 0 object_map_merge_time_median : 0 object_map_merge_total : 0 object_merge : 0 object_merge_time_100 : 0 object_merge_time_95 : 0 object_merge_time_99 : 0 object_merge_time_mean : 0 object_merge_time_median : 0 object_merge_total : 1 object_set_merge : 0 object_set_merge_time_100 : 0 object_set_merge_time_95 : 0 object_set_merge_time_99 : 0 object_set_merge_time_mean : 0 object_set_merge_time_median : 0 object_set_merge_total : 0 pbc_active : 0 pbc_connects : 0 pbc_connects_total : 0 pipeline_active : 0 pipeline_create_count : 0 pipeline_create_error_count : 0 pipeline_create_error_one : 0 pipeline_create_one : 0 postcommit_fail : 0 precommit_fail : 0 read_repairs : 0 read_repairs_counter : 0 read_repairs_counter_total : 0 read_repairs_fallback_notfound_count : undefined read_repairs_fallback_notfound_one : undefined read_repairs_fallback_outofdate_count : undefined read_repairs_fallback_outofdate_one : undefined read_repairs_map : 0 read_repairs_map_total : 0 read_repairs_primary_notfound_count : undefined read_repairs_primary_notfound_one : undefined read_repairs_primary_outofdate_count : undefined read_repairs_primary_outofdate_one : undefined read_repairs_set : 0 read_repairs_set_total : 0 read_repairs_total : 0 rebalance_delay_last : 0 rebalance_delay_max : 0 rebalance_delay_mean : 0 rebalance_delay_min : 0 rejected_handoffs : 0 riak_kv_vnodeq_max : 0 riak_kv_vnodeq_mean : 0.0 riak_kv_vnodeq_median : 0 riak_kv_vnodeq_min : 0 riak_kv_vnodeq_total : 0 riak_kv_vnodes_running : 32 riak_pipe_vnodeq_max : 0 riak_pipe_vnodeq_mean : 0.0 riak_pipe_vnodeq_median : 0 riak_pipe_vnodeq_min : 0 riak_pipe_vnodeq_total : 0 riak_pipe_vnodes_running : 32 ring_creation_size : 64 ring_members : ['riak@192.168.0.1','riak@192.168.0.2'] ring_num_partitions : 64 ring_ownership : <"[{'riak@192.168.0.1',32},{'riak@192.168.0.2',32}]"> rings_reconciled : 0 rings_reconciled_total : 20 set_actor_counts_100 : 0 set_actor_counts_95 : 0 set_actor_counts_99 : 0 set_actor_counts_mean : 0 set_actor_counts_median : 0 skipped_read_repairs : 0 skipped_read_repairs_total : 0 storage_backend : riak_kv_eleveldb_backend sys_driver_version : <<"2.2">> sys_global_heaps_size : deprecated sys_heap_type : private sys_logical_processors : 1 sys_monitor_count : 319 sys_otp_release : <<"R16B02_basho8">> sys_port_count : 35 sys_process_count : 1119 sys_smp_support : true sys_system_architecture : <<"x86_64-unknown-linux-gnu">> sys_system_version : <<"Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [smp:1:1] [async-threads:64] [kernel-poll:true] [frame-pointer]">> sys_thread_pool_size : 64 sys_threads_enabled : true sys_wordsize : 8 vnode_counter_update : 0 vnode_counter_update_time_100 : 0 vnode_counter_update_time_95 : 0 vnode_counter_update_time_99 : 0 vnode_counter_update_time_mean : 0 vnode_counter_update_time_median : 0 vnode_counter_update_total : 0 vnode_get_fsm_time_100 : 0 vnode_get_fsm_time_95 : 0 vnode_get_fsm_time_99 : 0 vnode_get_fsm_time_mean : 0 vnode_get_fsm_time_median : 0 vnode_gets : 0 vnode_gets_total : 4 vnode_index_deletes : 0 vnode_index_deletes_postings : 0 vnode_index_deletes_postings_total : 0 vnode_index_deletes_total : 0 vnode_index_reads : 0 vnode_index_reads_total : 0 vnode_index_refreshes : 0 vnode_index_refreshes_total : 0 vnode_index_writes : 0 vnode_index_writes_postings : 0 vnode_index_writes_postings_total : 0 vnode_index_writes_total : 3 vnode_map_update : 0 vnode_map_update_time_100 : 0 vnode_map_update_time_95 : 0 vnode_map_update_time_99 : 0 vnode_map_update_time_mean : 0 vnode_map_update_time_median : 0 vnode_map_update_total : 0 vnode_put_fsm_time_100 : 0 vnode_put_fsm_time_95 : 0 vnode_put_fsm_time_99 : 0 vnode_put_fsm_time_mean : 0 vnode_put_fsm_time_median : 0 vnode_puts : 0 vnode_puts_total : 3 vnode_set_update : 0 vnode_set_update_time_100 : 0 vnode_set_update_time_95 : 0 vnode_set_update_time_99 : 0 vnode_set_update_time_mean : 0 vnode_set_update_time_median : 0 vnode_set_update_total : 0 write_once_merge : 0 write_once_put_objsize_100 : 0 write_once_put_objsize_95 : 0 write_once_put_objsize_99 : 0 write_once_put_objsize_mean : 0 write_once_put_objsize_median : 0 write_once_put_time_100 : 0 write_once_put_time_95 : 0 write_once_put_time_99 : 0 write_once_put_time_mean : 0 write_once_put_time_median : 0 write_once_puts : 0 write_once_puts_total : 0 disk : [{"/",14318640,12}, {"/dev/shm",2996628,0}, {"/boot",487652,8}, {"/var/log",12254384,1}] riak_auth_mods_version : <<"2.0.1-0-g31b8b30">> erlydtl_version : <<"0.7.0">> riak_control_version : <<"2.1.1-0-g5898c40">> cluster_info_version : <<"2.0.2-0-ge231144">> yokozuna_version : <<"2.1.0-0-gcb41c27">> ibrowse_version : <<"4.0.2">> riak_search_version : <<"2.0.2-0-g8fe4a8c">> merge_index_version : <<"2.0.0-0-gb701dde">> riak_kv_version : <<"2.1.0-0-g6e88b24">> riak_api_version : <<"2.1.1-2-g94a9485">> riak_pb_version : <<"2.1.0.2-0-g620bc70">> protobuffs_version : <<"0.8.1p5-0-gf88fc3c">> riak_dt_version : <<"2.1.0-2-ga2986bc">> sidejob_version : <<"2.0.0-0-gc5aabba">> riak_pipe_version : <<"2.1.0-2-gc2d7d28">> riak_core_version : <<"2.1.1-0-g429c22d">> exometer_core_version : <<"1.0.0-basho2-0-gb47a5d6">> poolboy_version : <<"0.8.1p3-0-g8bb45fb">> pbkdf2_version : <<"2.0.0-0-g7076584">> eleveldb_version : <<"2.1.0-0-ga36dbd6">> clique_version : <<"0.2.6-0-g40072d2">> bitcask_version : <<"1.7.0">> basho_stats_version : <<"1.0.3">> webmachine_version : <<"1.10.8-0-g7677c24">> mochiweb_version : <<"2.9.0">> inets_version : <<"5.9.6">> xmerl_version : <<"1.3.4">> erlang_js_version : <<"1.3.0-0-g07467d8">> runtime_tools_version : <<"1.8.12">> os_mon_version : <<"2.2.13">> riak_sysmon_version : <<"2.0.0">> ssl_version : <<"5.3.1">> public_key_version : <<"0.20">> crypto_version : <<"3.1">> asn1_version : <<"2.0.3">> sasl_version : <<"2.3.3">> lager_version : <<"2.0.3">> goldrush_version : <<"0.1.6">> compiler_version : <<"4.9.3">> syntax_tools_version : <<"1.6.11">> stdlib_version : <<"1.19.3">> kernel_version : <<"2.16.3">>

lukebakken commented 8 years ago

What is the output of the following commands when run on both nodes in this cluster?

riak-admin ringready
riak-admin member-status
riak-admin ring-status
ubeatha commented 8 years ago

Updated report to include ringready output. Output for ringready, ring-status, and member-status identical on both nodes.