Closed slfritchie closed 10 years ago
For example, using Erlang tracing to watch entry & return of bitcask:needs_merge():
{2013,3,19} {3,13,26}: bitcask needs_merge: 113 msec
{2013,3,19} {3,14,18}: bitcask needs_merge: 116 msec
{2013,3,19} {3,14,18}: bitcask needs_merge: 114 msec
{2013,3,19} {3,14,23}: bitcask needs_merge: 101 msec
{2013,3,19} {3,15,40}: bitcask needs_merge: 104 msec
{2013,3,19} {3,15,40}: bitcask needs_merge: 106 msec
{2013,3,19} {3,16,4}: bitcask needs_merge: 121 msec
{2013,3,19} {3,16,40}: bitcask needs_merge: 101 msec
{2013,3,19} {3,17,18}: bitcask needs_merge: 108 msec
{2013,3,19} {3,17,18}: bitcask needs_merge: 108 msec
{2013,3,19} {3,18,40}: bitcask needs_merge: 115 msec
{2013,3,19} {3,18,40}: bitcask needs_merge: 118 msec
{2013,3,19} {3,19,26}: bitcask needs_merge: 145 msec
We have the same behavior/issue in our cluster, thats also our explanation why response times increase over time (increased amount of bitcask files per vnode over time).
The problem gets worse because bitcask:needs_merge() is scheduled for all vnodes at roughly the same time (a possible solution would be to schedule the first execution of needs_merge with a random offset per vnode). You can see that the peaks int the response time graph follow exactly the 3 minute interval of bitcask:needs_merge (http://puu.sh/35ezR.png).
We see the same behavior (blocking vnodes processes, huge increase of the response times) when calling riak-admin vnode-status on a node which has a lot of bitcask files per vnode (which leads in the end to the same issue)
https://github.com/basho/bitcask/blob/07b24e80f6e7f1b3407eea198c4ea5260d6b2d99/src/bitcask.erl#L641 https://github.com/basho/bitcask/blob/07b24e80f6e7f1b3407eea198c4ea5260d6b2d99/src/bitcask.erl#L809
We faced same issue after upgrade to 1.3.1. I made rolling restart today - and ... it's a magic - PUT time was dropped from 150 ms to 2 ms! - http://s15.postimg.org/3lrl036yz/put.png
Thanks for pointing to right direction, Stefan, I hope that will be fixed soon - it's not very comfortable to do rolling restarts...
Could someone from @basho give this issue some love? We have to restart our production cluster every week... we are not 100% sure that this issue is the root cause of our increasing response times over time - but for us it looks like.
please see http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-May/012183.html http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-May/012184.html
It seems that our issue are (almost) gone after first cluster restart, current PUT 95th percetile grows to 2400 mks from 2000, but it looks OK. Although we see big PUT values during peak load time... But it will be good to know that issue is fixed :)
@stefan-mees I don't think that this is your issue, to be honest. I think what you're seeing there is the file monitor increasing in total reductions over time (reductions are an erlang work measurement) because it's a long-running process. 24M reductions isn't a huge number.
I'll contact you via email so hopefully we can find out what's going on with your cluster.
For maximum parallelism, Bitcask should be avoiding use of file I/O-related functions that end up calling the local
file_server
process.For example,
bitcask:needs_merge()
callsfilelib:is_file()
, which is handled by and serialized by a single Erlang process: registered namefile_server_2
, code modulefile_server
.IIRC, the Hibari code uses some
prim_file
hackery to avoid sending file I/O to the localfile_server_2
proc.