basho / riak_dt

Convergent replicated datatypes in Erlang
Apache License 2.0
353 stars 70 forks source link

orswot doesn't remove deleted entries on merge #78

Closed asonge closed 10 years ago

asonge commented 10 years ago

The riak_dt_orswot CRDT keeps deleted entries after they've been removed if the clocks haven't been incremented by an add operation. Example test case:

    {ok, A1} = riak_dt_orswot:update({add, <<"foo">>}, 1, riak_dt_orswot:new()),
    {ok, A2} = riak_dt_orswot:update({add, <<"bar">>}, 1, A1),
    {ok, B1} = riak_dt_orswot:update({add, <<"baz">>}, 2, riak_dt_orswot:new()),
    C = riak_dt_orswot:merge(A2, B1),
    {ok, A3} = riak_dt_orswot:update({remove, <<"bar">>}, 1, A2),
    D = riak_dt_orswot:merge(A3, C),
    % Actual return value is [<<"bar">>,<<"baz">>,<<"quux">>]
    [<<"baz">>,<<"foo">>] == riak_dt_orswot:value(D).
seankerr commented 10 years ago

Good job!

russelldb commented 10 years ago

This is an incorrect fix, though the bug was introduced by me here https://github.com/asonge/riak_dt/commit/02150e48099c7d7c8e3024c156cfa7fad17326d9. The problem is that the either_dominates function (this was all an optimisation) uses descends not strict_descends. A patch is on the way, and I'll make sure it includes your test.

russelldb commented 10 years ago

Ok, I appreciate the test, many thanks. And good find on the bug. It snuck through when I added the optimisation stuff. I've removed that bit of code, and your test passes, without your vclock increment on remove. Incrementing the vclock in remove is incorrect (as removes will then win over concurrent adds) and that is not the semantic we want.

Again, many thanks. Looking at this, our EQC tests are not good enough for this error to have crept in, so we'll be working on those too.

russelldb commented 10 years ago

Please see https://github.com/basho/riak_dt/pull/79 for the fix.

asonge commented 10 years ago

Thanks much, though I'm wondering how incrementing the set clock for just the actor would privileges removes over adds, as long as simultaneous actor-adds haven't been made. It's similar to adding an existing element to the set with the same actor immediately after the remove. I do recognize that this "fix" was less-than-ideal, at the very least. I guess I'll have to think about this some more.

I'll look out for a strict_descends implementation for the riak_dt_vclock to bring back the optimization. And again, thanks for the quick response.

russelldb commented 10 years ago

Strict descends isn't correct either. I just removed that bodged attempt at optimising merge a little. Have a look #79.

I'm a little paged out on the ORSWOT at the moment. Maybe reading the paper will help answer your question: http://arxiv.org/pdf/1210.3368.pdf

If incrementing the clock on remove is not incorrect, it is not needed for correctness.

russelldb commented 10 years ago

Fixed by #79. Closing, with thanks!