basho / bitcask

because you need another a key/value storage engine
1.29k stars 173 forks source link

Fix epoch comparison by find_keydir_entry() when keydir->pending != NULL #177

Closed slfritchie closed 10 years ago

slfritchie commented 10 years ago

cc: @engelsanchez

When the keydir is truly frozen, i.e., when keydir->pending != NULL, then the comparison of the desired epoch must be greater-than-or-equal. This fix matches the epoch comparison that is performed by the while (s != NULL) loop in proxy_kd_entry_at_epoch().

I originally found this bug using a merge of the bug/deferred-delete-Cd8origin and bugfix/fold-open-delete-race branches at commits ac71627fb8 and c8c0accc9e5, respectively.

Cn4 = [[{set,{var,1},{call,bitcask_pulse,fork,[[{init,{state,undefined,false,false,[]}},{set,{not_var,9},{not_call,bitcask_pulse,bc_open,[false,{true,{102,47,431},15}]}},{set,{not_var,15},{not_call,bitcask_pulse,fold,[{not_var,9}]}},{set,{not_var,16},{not_call,bitcask_pulse,fold,[{not_var,9}]}}]]}},{set,{var,2},{call,bitcask_pulse,bc_open,[true,{true,{351,137,29},75}]}},{set,{var,4},{call,bitcask_pulse,put,[{var,2},1,<<51,170,22,66,176,75,118>>]}},{set,{var,5},{call,bitcask_pulse,bc_close,[{var,2}]}},{set,{var,6},{call,bitcask_pulse,fork,[[{init,{state,undefined,false,false,[]}},{set,{not_var,14},{not_call,bitcask_pulse,bc_open,[false,{true,{133,280,435},32}]}},{set,{not_var,26},{not_call,bitcask_pulse,fold,[{not_var,14}]}},{set,{not_var,27},{not_call,bitcask_pulse,fold,[{not_var,14}]}}]]}},{set,{var,7},{call,bitcask_pulse,bc_open,[true,{true,{430,377,266},86}]}},{set,{var,8},{call,bitcask_pulse,merge,[{var,7}]}},{set,{var,18},{call,bitcask_pulse,put,[{var,7},2,<<0,0,0,0,0,0,0,0,0>>]}},{set,{var,20},{call,bitcask_pulse,merge,[{var,7}]}},{set,{var,23},{call,bitcask_pulse,fold_keys,[{var,7}]}},{set,{var,30},{call,bitcask_pulse,merge,[{var,7}]}}],{61218,16404,17227},[{events,[]}]].
[begin Now = {1404,524363,267720}, io:format("Now ~p ", [Now]), true = eqc:check(eqc:testing_time(24*3600, bitcask_pulse:prop_pulse()), [lists:nth(1, Cn4), Now, lists:nth(3, Cn4)]) end || _ <- lists:seq(1,5)].
[begin Now = now(), io:format("Now ~p ", [Now]), true = eqc:check(eqc:testing_time(24*3600, bitcask_pulse:prop_pulse()), [lists:nth(1, Cn4), Now, lists:nth(3, Cn4)]) end || _ <- lists:seq(1,550)].

... which results in a fold by the 3rd pid (i.e. the 2nd forked pid) to find only 1 of the two keys: <<"kk01">> is wrongly skipped by the fold.

A merge prior to the start of the bad fold takes place during another fold. That merge copies the only key, <<"kk01">>, forward during epoch 9 ... but that mutation causes the keydir to freeze, so now keydir->pending != NULL. When the 3rd pid performs its fold, it is also in epoch 9. When folding over the new file, the bug uses a strictly greater than comparison, so the correct entry in the pending part of the keydir is skipped, and so fold's get() finds the old/prior-to-merge entry in the keydir ... and that old entry does not correspond to the the current fold file and so is skipped. The fold (correctly) never opens the old/prior-to-merge file, so the <<"kk01">> never appears in the fold's results.

engelsanchez commented 10 years ago

Well well. It's déjà vu all over again. I made a comment about this in the original PR. The argument to keep it was that iteration always creates a epoch that is the last op + 1 and epochs are only used for iterations. The way the epoch for the frozen keydir iteration is chosen threw a wrench on the initial assumptions. I found the epoch creation and checks a bit hard to follow on the original PR and left a comment to that effect. I'm going to review all epoch usage :fish: again to see if we have it all consistent now.

engelsanchez commented 10 years ago

:+1: 091fea6 I've been here before so I'm pretty convinced it works. Since creating an iterator increments the epoch like puts and deletes (see here), and the epoch is checked currently only while iterating, Evan was convinced it was not necessary to allow the case where the epoch was equal to the input one. But because folding when the pending hash exists caps the epoch used to that of the last operation on the normal hash to avoid iterating over changing values (see here), not using equal makes the fold not see the last operation before the keydir was frozen.

slfritchie commented 10 years ago

@borshop merge