Closed kesslerm closed 9 years ago
Thanks for this @kesslerm. Will start looking into this. Also post your notes from attempting this w/ clear_trees
as well (i.e. https://github.com/basho/yokozuna/blob/develop/src/yz_entropy_mgr.erl#L125, instead of expire) as per our chat. Thanks.
@zeeshanlakhani, the behaviour with clear_trees
instead of expire is exactly the same, both with the standard AAE settings as well as with the accelerated anti_entropy.tree.build_limit.per_timespan = 5m
.
The number of missing YZ entries is rising steadily over the repair period; after the first repair operation at least 1 node reports a lower number while at least one node still reports the original number of entries. Later all nodes report lower numbers in YZ.
ok, thanks @kesslerm. And, you've had the same issues w/ clear/expire w/o anti_entropy.tree.build_limit.per_timespan = 5m
right? Just want to be sure on that, thanks.
@zeeshanlakhani, yes absolutely the same behaviour with default settings and anti_entropy.tree.build_limit.per_timespan = 5m
. Both clear and expire show the issue.
We tracked this down to an incompatibility between the default bucket type and yokozuna's AAE feature. The issue has not been seen with non-default bucket types, so far. The default bucket type has allow_mult=false
and dvv_enabled=false
when riak is started with a default riak.conf
file (as the legacy settings for these values are enforced via cuttlefish). Manually setting those values just on a given bucket under the default bucket type (not the entire bucket type) does not rectify this problem.
At this point it's safest to suggest that yokozuna with AAE enabled should not be used on the default bucket type unless the properties mentioned are set to the values all non-legacy bucket types would have. We need to investigate still if the problem occurs on non-default bucket types if one or both of the properties are changed from their default values.
For cross reference: the fix was https://github.com/basho/yokozuna/pull/486 (if wrong, please correct me)
When AAE trees expire, Yokozuna starts to lose entries. After initially reporting identical numbers, once the YZ AAE trees have been expired the total number of entries as reported by "http://$RIAK_HOST/solr/$INDEX_NAME/select?q=_:_" is less than the total number of keys reported by key listing.
Steps to reproduce:
search = on
andanti_entropy.tree.build_limit.per_timespan = 5m
toriak.conf
on each node, start the nodes and join them into a cluster.riak attach
on one of the nodes and enter at the erlang prompt