AAE tree rebuild timeouts

[error] <0.23928.132>@yz_entropy:get_entropy_data:74 failed to iterate over entropy data due to request exceeding timeout 60000 for filter params

Increasing the search.solr.ed_request_timeout seems to be a workaround, but more consistent tree rebuild times would be a solution.

Explanation from @fadushin on slack

What's going on with the ed timeout is this. Yokozuna keeps it own set of parallel AAE trees, which it uses to periodically compare with the riak_kv AAE trees.

[11:04] And like riak_kv AAE trees, the yokozuna AAE trees expire and need to be rebuilt.

[11:05] In order to rebuild a yokozuna AAE tree, Yoko will consult Solr for the list of all keys and (object) hashes it has for a given Riak partition. Basically, give me all the key/hash pairs for partition 8346294948458739739.

The hash values are stored in each Solr document under the _yz_ed field, and you'll notice that this field is indexed, not stored.

[11:12] So technically, there is no way to query for this field. So what yokozuna has done has been to install a Java module in Solr, which handles what we call entropy data queries: https://github.com/basho/yokozuna/blob/develop/java_src/com/basho/yokozuna/handler/EntropyData.java (edited) The problem is this. This handler will iterate over all lucene terms in the index (i.e., Solr core), even terms associated with documents that are not stored in the partition in the request. So, for example, if you have 15 vnodes on your Riak node, you are traversing all terms for the index (probably 1:1 with a bucket, unless you have multiple buckets indexing to the same core), and discarding all the terms that don't match the partition being queried. C.f., in particular, https://github.com/basho/yokozuna/blob/develop/java_src/com/basho/yokozuna/handler/EntropyData.java#L140 GitHub basho/yokozuna yokozuna - Riak + Solr

(edited)

[11:19] Note that each query is paged, with a continuation token. So to rebuild an index, you are likely to make multiple EntrypyData queries as you page over the key/hash pairs.

[11:20] My conjecture on what is happening (I have seen this in some experiments I ran a while back) is that many of the ED queries run very quickly. In fact, in my experiments, the min latency measurements are far better than if we were to use cursormarks instead.

[11:21] The problem is, once in a while you will hit a page that takes a loooong time to complete. My guess is that we hit some kind of "gap" in the term iterator, where were are chewing over a bunch of terms that don't match the requested partition. This takes more than the timeout (60s), and the request fails.

[11:23] The insideous thing is that the tree rebuild fails, and it will be rescheduled to happen again, and it will just fail again for the same reason. So you get these attempts to rebuild Yoko AAE trees, which fail over and over again. I have no idea what the real impact is on Solr, when we run these rebuilds.

[11:25] Here is an experimental branch that uses cursormarks, instead of Yokozuna's EntropyData endpoint: https://github.com/basho/yokozuna/compare/develop...feature/fd/yz-cursors-ed GitHub basho/yokozuna yokozuna - Riak + Solr

[11:26] You'll see it requires a schema change, which requires .... (drumroll) reindexing, which is not supported.

[11:28] In the experiments I ran, the mean latency was worse with cursormarks, but the outlier percentiles are much better. And it's those outliers that kill us.

basho / yokozuna

AAE tree rebuild timeouts #740