Open cread opened 9 years ago
Thanks for this report, and the detail. I know we've been a bit wary about HIPE; when I saw it mentioned I fully expected the failing node to be using it.
https://github.com/basho/otp/pull/3 is the only change between basho5 and basho6.
Empirically, HiPE _seems_ to be ok on 64-bit Linux, but I'm not going to go out on a limb and recommend it - the problem is that if it breaks, it won't generally do it until the system is pretty well loaded, so the results can be catastrophic.
On careful review of the above there are version differences in several key components, so they can't both be from the 2.0.4
tag. Are these Basho release packages?
The packages that are giving me problems are the Basho released .deb packages available from http://docs.basho.com/riak/latest/downloads/
We've been running the HiPE enabled build in production since late January this year, which I built from the 2.0.4 tarball downloaded from the above URL. I built them following the Installing from Source instructions for 2.0.0.
I've just done a test with my own build of 2.1.1 from source. The only difference between this build and the .deb packages is that my build has HiPE enabled. We see the same behavior.
Just in case it matters, here are the properties for the bucket we're writing to:
{
"props": {
"young_vclock": 20,
"write_once": false,
"w": "quorum",
"small_vclock": 50,
"rw": "quorum",
"r": 1,
"last_write_wins": false,
"dw": "quorum",
"dvv_enabled": false,
"chash_keyfun": {
"fun": "chash_std_keyfun",
"mod": "riak_core_util"
},
"big_vclock": 50,
"basic_quorum": false,
"allow_mult": false,
"name": "r",
"linkfun": {
"fun": "mapreduce_linkfun",
"mod": "riak_kv_wm_link_walker"
},
"n_val": 3,
"notfound_ok": false,
"old_vclock": 86400,
"postcommit": [],
"pr": 0,
"precommit": [],
"pw": 0
}
}
I'm going to see if I can reproduce this on a single node now.
I've been able to reproduce the problem using basho_bench on our test cluster. Here's the basho_bench configuration:
{mode, max}.
{duration, 1440}.
{report_interval, 1}.
{concurrent, 192}.
{driver, basho_bench_driver_riakc_pb}.
{key_generator, {int_to_bin_bigendian, {partitioned_sequential_int, 4294967296}}}.
{value_generator, {uniform_bin, 2048, 1048576}}.
{riakc_pb_ips, [
{10,1,1,1},
{10,1,1,2},
{10,1,1,3},
{10,1,1,4},
{10,1,1,5}
]}.
{riakc_pb_replies, 1}.
{operations, [{put, 1}]}.
{pb_connect_options, [{auto_reconnect, true}]}.
{pb_timeout_general, 30000}.
{pb_timeout_read, 5000}.
{pb_timeout_write, 5000}.
{pb_timeout_listkeys, 50000}.
Within 2 hours or so you'll see node_put_fsm_active
start creeping up. After it passes 1000 or so you can stop the load and that value does not drop.
Here's the put_fsm
stats from one of the nodes once the load has been removed:
late_put_fsm_coordinator_ack : 941
node_put_fsm_active : 626
node_put_fsm_active_60s : 0
node_put_fsm_counter_time_100 : 0
node_put_fsm_counter_time_95 : 0
node_put_fsm_counter_time_99 : 0
node_put_fsm_counter_time_mean : 0
node_put_fsm_counter_time_median : 0
node_put_fsm_in_rate : 0
node_put_fsm_map_time_100 : 0
node_put_fsm_map_time_95 : 0
node_put_fsm_map_time_99 : 0
node_put_fsm_map_time_mean : 0
node_put_fsm_map_time_median : 0
node_put_fsm_out_rate : 0
node_put_fsm_rejected : 0
node_put_fsm_rejected_60s : 0
node_put_fsm_rejected_total : 0
node_put_fsm_set_time_100 : 0
node_put_fsm_set_time_95 : 0
node_put_fsm_set_time_99 : 0
node_put_fsm_set_time_mean : 0
node_put_fsm_set_time_median : 0
node_put_fsm_time_100 : 0
node_put_fsm_time_95 : 0
node_put_fsm_time_99 : 0
node_put_fsm_time_mean : 0
node_put_fsm_time_median : 0
vnode_put_fsm_time_100 : 0
vnode_put_fsm_time_95 : 0
vnode_put_fsm_time_99 : 0
vnode_put_fsm_time_mean : 0
vnode_put_fsm_time_median : 0
I'm now running this configuration against a single machine to see if I can reproduce it there.
Ran the same script on a single node for 8 hours without being able to reproduce the lock up which I guess makes this a cluster problem.
Bump.
Hi @cread
I continue to believe what you're seeing is not directly related to HiPE, but I'm wondering if you're seeing the same behavior across versions with and without a HiPE-enabled VM? In particular, HiPE (when it works) does optimize some operations, so could this be related to being CPU-bound?
Are you seeing your CPUs maxed out (or nearly so) without HiPE, but with lower loads when HiPE's enabled?
Correct, this is not HiPE related, as I pointed out in my first comment on 11/11. I get the same bad behavior using Riak 2.1.1 using the .deb package for Ubuntu 14.04 as built by Basho (no HiPE) and using a build I did from the source tarball with HiPE. The CPU's do not max out with either build.
Have you been able to reproduce the lockup using the basho_bench config I provided earlier?
@cread, can you confirm whether your kernel is tuned in accordance with the "Virtual Memory" and "Schedulers" sections of our System Performance Tuning docs?
If it's not, can you try those settings? Given the symptoms you describe (which we're working to reproduce), it's clearly not the direct problem, but I'd like to eliminate (or identify) it as a contributor that might be driving the node into the state you're seeing.
@tburghart I can confirm we're tuned as per the doc.
Thanks, just trying to cover all the bases
@tburghart have you guys been able to reproduce this yet?
Ping! Anyone there? Anyone looking at this?
One of our engineers attempted to reproduce and failed, but he wanted to try again more rigorously.
Unfortunately he’s been extremely busy. I hoped he’d have time this week, but he’s been out sick.
-John
Thanks John! Let me know if you need any more help or info from me getting to the bottom of this...
We've been building riak from source for a while, but I've had trouble getting the 2.1.x releases built reliably and so would like to revert back to using the .deb package. The problem I have is that in our test environment we always manage to max out node_put_fsm_active under sustained write loads, and they never drop, even when disk activity drops to zero.
When running riak 2.0.4 on R16B02-basho5 (our current prod version) everything is working as expected.
Using the .deb package of 2.0.4 pushes us to R16B02_basho6, which is where we see the problem arrive of node_puts_fsm_active going up and never dropping back own again, even after the write load stops.
Further testing with the riak 2.0.6 and 2.1.1 .deb packages (both contain R16B02_basho8) show the same problem.
As the one that works is using
hipe
as the VM, and the one that locks up does not, could it be a VM difference?Some technical details of the hardware:
Select
riak-admin status
outputSystem that works:
System that locks up: