basho / yokozuna

Riak + Solr
245 stars 76 forks source link

Yokozuna loses entries when YZ AAE trees expire, dealing w/ Default Bucket Types [JIRA: RIAK-1674] #481

Closed kesslerm closed 9 years ago

kesslerm commented 9 years ago

When AAE trees expire, Yokozuna starts to lose entries. After initially reporting identical numbers, once the YZ AAE trees have been expired the total number of entries as reported by "http://$RIAK_HOST/solr/$INDEX_NAME/select?q=_:_" is less than the total number of keys reported by key listing.

Steps to reproduce:

#!/bin/bash
RIAK_HOST="http://127.0.0.1:10018"

test_results_bucket_props=`curl -s "$RIAK_HOST/buckets/Test_Results/props"`
if [[ $test_results_bucket_props  =~ "index" ]]
then
  echo "Index already exists";
  exit
fi

echo "Creating index"
curl -XPUT "$RIAK_HOST/search/index/Test_Results"
sleep 10

echo "Adding index to bucket"
curl -XPUT -H "Content-Type: application/json" "$RIAK_HOST/buckets/Test_Results/props" -d '{"props":{"search_index":"Test_Results"}}'

test_results_bucket_props=`curl -s "$RIAK_HOST/buckets/Test_Results/props"`
if [[ $test_results_bucket_props  =~ "index" ]]
then
  echo "Index added";
fi
#!/bin/bash

for i in {0..4999}
do
        uuid=$(uuidgen)
        echo ${uuid}

        curl -XPUT http://127.0.0.1:10018/buckets/Test_Results/keys/${uuid} \
                -H "Content-Type: application/json" \
            -d "{\"uuid\": \"${uuid}\", \"date\": \"$(date "+%FT%T.00000:Z")\"}"
done
#!/bin/bash
echo "solr:"
curl -XGET "http://127.0.0.1:10018/solr/Test_Results/select?wt=json&q=*:*" 2>/dev/null | json_pp | grep numFound
echo "keys in bucket:"
echo $((`curl -XGET "http://127.0.0.1:10018/buckets/Test_Results/keys?keys=true" 2>/dev/null | json_pp | wc -l` - 4))
rpc:multicall([node() | nodes()], yz_entropy_mgr, expire_trees, []).
zeeshanlakhani commented 9 years ago

Thanks for this @kesslerm. Will start looking into this. Also post your notes from attempting this w/ clear_trees as well (i.e. https://github.com/basho/yokozuna/blob/develop/src/yz_entropy_mgr.erl#L125, instead of expire) as per our chat. Thanks.

kesslerm commented 9 years ago

@zeeshanlakhani, the behaviour with clear_trees instead of expire is exactly the same, both with the standard AAE settings as well as with the accelerated anti_entropy.tree.build_limit.per_timespan = 5m.

The number of missing YZ entries is rising steadily over the repair period; after the first repair operation at least 1 node reports a lower number while at least one node still reports the original number of entries. Later all nodes report lower numbers in YZ.

zeeshanlakhani commented 9 years ago

ok, thanks @kesslerm. And, you've had the same issues w/ clear/expire w/o anti_entropy.tree.build_limit.per_timespan = 5m right? Just want to be sure on that, thanks.

kesslerm commented 9 years ago

@zeeshanlakhani, yes absolutely the same behaviour with default settings and anti_entropy.tree.build_limit.per_timespan = 5m. Both clear and expire show the issue.

kesslerm commented 9 years ago

We tracked this down to an incompatibility between the default bucket type and yokozuna's AAE feature. The issue has not been seen with non-default bucket types, so far. The default bucket type has allow_mult=false and dvv_enabled=false when riak is started with a default riak.conf file (as the legacy settings for these values are enforced via cuttlefish). Manually setting those values just on a given bucket under the default bucket type (not the entire bucket type) does not rectify this problem.

At this point it's safest to suggest that yokozuna with AAE enabled should not be used on the default bucket type unless the properties mentioned are set to the values all non-legacy bucket types would have. We need to investigate still if the problem occurs on non-default bucket types if one or both of the properties are changed from their default values.

shino commented 9 years ago

For cross reference: the fix was https://github.com/basho/yokozuna/pull/486 (if wrong, please correct me)