Top-N and pagination can break across coverage plans [JIRA: RIAK-1551]

rzezeski commented 10 years ago

When Yokozuna services a query it uses Riak Core's coverage builder to build a query plan to send to Solr. This plan tells the local Solr node which shards to query and how to filter their data so that no overlapping occurs. Currently, there is a coverage plan per index on each node and a new one is calculated every 2 seconds and cached for use in mochiglobal. This means that the coverage plans may change over time and per-node. A user that runs the same query multiple times or decides to paginate through a result set may get a different coverage plan per request.

This different plan per request behavior will cause incorrect results when the sort parameter includes fields that can have different values per replica of the same object. For example, the score is a calculated value and depends on neighboring data to determine the IDF (Inverse Doc Freq). Since Riak stores replicas on adjacent partitions (as opposed to the "great repartition" which replicates entire partitions) the neighboring replicas for a given object replica will be different and thus the IDF will be different. Therefore, the score for the same exact object can change across query plans. This can cause the top-N for a query to change even on a quiescent dataset, or can cause objects to repeat when paginating into a result set. Another problematic field is the _yz_id field which uses the unique combination of type-bucket-key but then also tacks on the logical partition number as the suffix. Once again this causes all N replicas of a given object to be distinct entities which will break pagination over a result set.

To summarize, the potential problematic queries are:

Running the same query for top-N (sort by score) over a quiescent dataset can return different results depending on which coverage plan is used.
Paginating into, or entirely through, a result set when sorting on score or _yz_id may have objects repeated in the results. The numFound will always be correct but if you count the unique objects in the results you will get less than numFound.

If you are paginating simply to get all keys that match and don't care about score then you can sort on type-bucket-key (HTTP example: _yz_rt+asc,_yz_rb+asc,_yz_rk+asc) to get the correct results back.

If you want to sort by score without repeating results then you must set rows >= to numFound. This requires having some idea of how many rows will match before running the query.

To get the same top-N for the same query with the same data then a patch must be made to Yokozuna. There are several ways to fix this.

Implement the "partition factor" stuff that has been discussed for 2.x. Basically this would allow Yokozuna to partition itself separately from the KV ring and act more like ES/SolrCloud. I.e. replicas would be mirrors of shards, not spread across shards (also know to Basho veterans as "great repartitioning", although it would not require KV changes).
Try the Solr distributed IDF patch. Not sure if it would solve the problem but would take a day or two to test. Not sure what the effect on latency would be or if the patch is "production ready" yet.
Use the same coverage plan, for a given index, on all nodes and use a different plan only when nodes go down or added/removed. Basically, keep the coverage plan as stable as possible.
???

Number 1 is the best and is planned work for the future but also could take months to code and test and will change the APIs.

Number 2 is easy to test but has some unknowns around latency and state of code. I'm not saying the code is necessarily bad but just that it's not in an official release yet and it hasn't been vetted yet.

Number 3 still has the issues above but minimizes them to node down/up and join/leave events. This may still be unacceptable to some people but it could be a good tradeoff (as long as it is documented for users to see) to give us more time to implement something like number 1.

That said, number 3 may have some of it's own tricky implementation details. Now all nodes need to agree on a coverage plan for a given index. To avoid overworking the same nodes we will want to try to create different plans for different indexes, etc. I have not had enough time to think through potential implementations for this solution yet. However, it seems on the surface it would be much less work than number 1.

To summarize one last time:

Yokozuna will have issues sorting by any fields that can have different values per object replica. This includes computed fields like score and static fields like _yz_id. This can be smoothed over by solution 3 but not entirely eliminated without a more fundamental change like number 1.

I'm happy to answer questions and entertain other solutions I have missed.

/cc @jonmeredith @gburd @michellep @coderoshi @bowrocker @kellymclaughlin @jtuple

rzezeski commented 10 years ago

Also tagging @kuenishi as it was his issue (private email, not on GitHub) that caused me to write a test and discover this bug.

reiddraper commented 10 years ago

This is not ideal, but I don't think it should hold up the release. Yokozuna is still a huge improvement over v.1.x Riak Search. Further, there are still plenty of use-cases where results are not affected. As I understand it, sorting on any field that is simply part of the data, and not 'computed' based on data stored in the same shard. I think this is still plenty enough utility.

michellep-basho commented 10 years ago

Thanks for the detailed write-up, Ryan. I am in agreement with the proposed short-term strategy for 2.0.

Michelle

On Wed, Apr 2, 2014 at 12:16 PM, Reid Draper notifications@github.comwrote:

This is not ideal, but I don't think it should hold up the release. Yokozuna is still a huge improvement over v.1.x Riak Search. Further, there are still plenty of use-cases where results are not affected. As I understand it, sorting on any field that is simply part of the data, and not 'computed' based on data stored in the same shard. I think this is still plenty enough utility.

Reply to this email directly or view it on GitHubhttps://github.com/basho/yokozuna/issues/355#issuecomment-39364550 .

kellymclaughlin commented 10 years ago

If we're going with #3 for the 2.0 release and I can be of help wrangling the coverage code please let me know.

rzezeski commented 10 years ago

Pretty sure this was the cause for issues in #309.

rzezeski commented 10 years ago

Given other things that need to be done and the fact that implementing (3) could be pretty tricky itself I think it is best for this to be a known issue in 2.0.0. It can be revisited later when there is the appropriate amount of time to give it a proper solution.

@jonmeredith @coderoshi @michellep

jonmeredith commented 10 years ago

Agreed, document and move on.

EvanOxfeld commented 9 years ago

Has anyone had success using a cursor with the type-bucket-key workaround? A sort of mysort+asc,_yz_rt+asc,_yz_rb+asc,_yz_rk+asc with cursorMark set to * results in error 400 -- Cursor functionality requires a sort containing a uniqueKey field tie breaker.

Unless I've misunderstood something _yz_id is the only unique key allowed in a schema, but it can't be used in the sort if I want consistent results.

zeeshanlakhani commented 9 years ago

@EvanOxfeld I'll have to take a deeper look and try out some things. We sort by type-bucket-key for MR, https://github.com/basho/yokozuna/pull/386/files#diff-2b19f9bd04ffe29aeea0af0c0659909eR136 w/o issue. https://issues.apache.org/jira/browse/SOLR-6277 is the issue to track on the Solr side that may explain things, but isn't yet slated for 5.0 or any release.

emnvn commented 8 years ago

Hello,

Currently, I counter this problem with riak 2.1.1. I have inserted new value to a bucket in this guide: http://docs.basho.com/riak/latest/dev/using/search/

example:

riakc_pb_socket:search(Pid, <<"famous">>, <<"name_s:tom">>). T = riakc_obj:new({<<"animals">>, <<"cats">>}, <<"tom">>, <<"{\"name_s\":\"tom\", \"age_i\":2}">>, "application/json"). riakc_pb_socket:put(Pid, T).

But when I make search, I got different results:

Sometime the record with _yz-id of <<"1_animals_cats_tom_42">> 10> riakc_pb_socket:search(Pid, <<"famous">>, <<"name_s:tom">>). {ok,{search_results,[{<<"famous">>, [{<<"score">>,<<"2.94591020000000014534e+00">>}, {<<"_yz_rb">>,<<"cats">>}, {<<"_yz_rt">>,<<"animals">>}, {<<"_yz_rk">>,<<"tom">>}, {<<"_yz_id">>,<<"1_animals_cats_tom_42">>}, {<<"name_s">>,<<"tom">>}, {<<"age_i">>,<<"2">>}]}], 2.9459102153778076,1}}
Sometime the record with _yz-id of <<"1_animals_cats_tom_43">> 11> riakc_pb_socket:search(Pid, <<"famous">>, <<"name_s:tom">>). {ok,{search_results,[{<<"famous">>, [{<<"score">>,<<"2.46633699999999977948e+00">>}, {<<"_yz_rb">>,<<"cats">>}, {<<"_yz_rt">>,<<"animals">>}, {<<"_yz_rk">>,<<"tom">>}, {<<"_yz_id">>,<<"1_animals_cats_tom_43">>}, {<<"name_s">>,<<"tom">>}, {<<"age_i">>,<<"2">>}]}], 2.466336965560913,1}}

I have read above articles but I don't know how to fix.

Thanks for any help!

fadushin commented 8 years ago

Hi @emnvn,

This looks normal. The _yz_ids differ by partition number (42 vs. 43), which is the result of running queries over different coverage plans. Riak will update coverage plan periodically to ensure that queries are run on a "correct" set of nodes and partitions, as well as to distribute query load throughout the cluster.

By the way, your _yz_ids are missing the '*' separator, but I think that is because github interpreted your text as markdown.

E.g., I suspect you meant

1*animals*cats*tom*43

emnvn commented 8 years ago

Hello fadushin,

It is the problem of function search of riak-erlang-client. It return the result is not right. Some time it return 2 docs, some time it only return one doc. It cause my problem, because I search docs by email. My email is unique, but some time it return two docs, it cause bugs happen in my program.

Why it can't not return only one doc ? because there is only one data object in DB.

Hope that you can help.

Thanks

fadushin commented 8 years ago

Hi @emnvn,

I see. Your example doesn't show this, so perhaps you are referring to a different use case?

In any event, if your use case does not involved top-n queries and pagination, may I suggest we move this conversation to the riak-users mailing list and/or the #riak channel on irc.freenode.net?

emnvn commented 8 years ago

Hello @fadushin

I have described wrongly the problem, These above results because I got directly from solr api via http not via riak-erlang-client.

I only return one record when I search with riak-erlang-client

Sorry for that.

fadushin commented 8 years ago

When you say "Solr api", do you mean the Solr endpoint on a riak node, e.g.,

http://localhost:8093/internal_solr/my_index/select?q=*:*

or the Riak HTTP query interface, e.g.,

http://localhost:8098/search/query/my_index?q=*:*

(where my_index is the name of your index)

If you are using the first HTTP interface, you will never get reliable results, because the query is not using a covering set to generate shard queries with filters for partitions. If the latter, the results should be the same as the riak-erlang-client.

runesl commented 8 years ago

Option 1. sounds like reimplementing Elastic Search' distribution code. Why not integrate directly with ES instead of SolR? This would make Riak Search a lot more interesting, and enable things like Kibana.

zeeshanlakhani commented 8 years ago

@rsltrifork not saying integration w/ ES can't be a thing. Nonetheless, if you want something like Kibana, we've been working on an integration w/ Banana: https://github.com/glickbot/riak-banana.

basho / yokozuna

Top-N and pagination can break across coverage plans [JIRA: RIAK-1551] #355