Open wbrown opened 10 years ago
It's a combination of both. The Yokozuna map-reduce functionality was added to be complete with current Riak Search but it has not been benchmarked or turned (it has hardcoded page size of 10). The deep paging issues with Solr don't help either. Hopefully, starting at the end of this week, there will be a period dedicated to performance testing. This would be a good issue to look into.
@wbrown : what do you mean when you say you've gotten Yokozuna's mapred_search inputs working in your setup ? can you please provide the key details on how to do it ?
@rzezeski : if this is answered https://github.com/basho/yokozuna/issues/319 , does this hard coding of page size would limit the result sets in 10 items batches ?
@sallespro It's a Python setup, and the key is that I had to modify the call in the Python library from riak_search
to yokozuna
.
Also, regarding #319, yes -- the batch size would be set to 10, dramatically slowing things down.
@rzezeski I wrote my own adaptive algorithm for that in Python. I do an initial search of about 100 elements, to get a count -- and to return immediately, if the result set is less than 100. If it's larger than 100, I spin up a bunch of workers and size the search according to the final amount.
I've generally been able to get 4,000 keys a second via this method right until I hit the deep paging issue mentioned elsewhere.
Moved to 2.0.1 because this doesn't absolutely have to get done for 2.0.0.
Comment for Jira.
I've gotten Yokozuna's
mapred_search
inputs working in my setup. However, it is extraordinarily slow.Doing a regular Solr-style
search()
yields records in the rates of thousands a second. However, doing the same search streamed via a map-reduce gives me rates of at best dozens of second. An additional observation is that when I hit deep-paging performance issues insearch()
, I get these similar rates.Is this an issue with how Yokozuna
mapred_search
feeds search results inputs, or is the function asking Solr for the entirety of the results and hitting the large result sets?