Open wbrown opened 10 years ago
Yes, the improved deep paging will be part of Solr 4.7. My hope is this will come out soon so that it can be integrated into Riak 2.0. If it does't then this support may have to wait a while longer. It's definitely on my radar and I very much would like it in for 2.0. I just can't promise anything at the moment.
Thanks for the answer -- let me know if there's anything I can do to help out in this direction, as this is extremely relevant to my use case.
While working on upgrading to Solr 4.7.0 I discovered that the new
cursor support and Yokozuna don't get along. In my benchmarks I
observed both under and over counts. I believe this has to do with a
combination of the cursor implementation and _yz_id
.
The unique id is:
<type>_<bucket>_<key>_<logical_partition>[_<sibling>]
Every object will have N index replicas with 3 different
<logical_partition>
values. Given an object on partitions 4, 5, and
6, if the first page of the query ends on _4 but then hits a different
query coverage plan then it might see the same object id but for
partition _5. This is lexicographically later so the same object gets
counted twice. This would explain over count but not under count. In
my tests I was also sorting on score and I wonder if that could have
something to do with it?
Even if I ignored the over/under issues I also found the cursor based pagination to be much slower for smaller result sets. I haven't yet tested larger result sets yet.
More time is needed to investigate the issues here. There may not be enough time to have it all sorted out before 2.0. My hope is to have Solr 4.7 in 2.0 but cursor based paging may remain broken for a while. Even if it is fixed the protocol buffers API will not support cursor-based paging because the it does not support the needed fields.
Comment for Jira.
More benchmarking must be done.
Per @kesslerm and I's discussion, next steps would be to write a test over various query params and paging across many results and coverage plans.
Are there any plans to resurrect support for this, I too have a use-case for deep paging in Solr on top of Riak.
@suddenrushofsushi yep, we are working on it.
Any updates on this?
Any updates on this?
If you didn't already hear about it, Basho went under. Luckily, all the Riak assets were purchased by Bet365 in late 2017. I believe there is work on a new Riak release but I'm not sure if anyone is putting any work into Yokozuna. I have been out of the loop for a long time, but my guess is not many people are inclined to maintain Yokozuna (for various reasons, all of which are moot). If I were in your shoes, I wouldn't hold my breath.
I've been using Yokozuna to search through and retrieve large result sets, but it breaks down at around the 400-500K record mark due to Solr's issues with pagination.
Researching the issue, I stumbled across mentions of an efficient deep-paging patch.
http://searchhub.org/2013/12/12/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/
Are there any plans to integrate this into Yokozuna once this makes it into Solr release, or utilizing this patch independently?