basho / yokozuna

Riak + Solr
245 stars 76 forks source link

Performance of yz_solr:partition_list/1 [JIRA: RIAK-3221] #720

Open Vorticity-Flux opened 7 years ago

Vorticity-Flux commented 7 years ago

yz_solr:partition_list/1 function performs Solr lookup using facet search. https://github.com/basho/yokozuna/blob/02682312031a1935a5b0dcd51d6cb88e6718d0bd/src/yz_solr.erl#L280

For some reason in our setup it was observed that this Solr query takes 10 seconds to complete. It is likely that this query is waiting for commit to complete and/or new searcher to finish opening. (Details about observed poor Solr performance and reasons are given in issue https://github.com/basho/yokozuna/issues/719 ).

After consulting Solr IRC it was established that facet.method=enum resolves this aspect of our performance problems. With this parameter yz_solr:partition_list/1 always completes in under 10ms (1000 times speed up!).

For now we have modified solr_config.xml and set the default facet.method to enum. However as far as I understand this is not a reliable solution (as solr_config.xml is overwritten in some circumstances(?)).

I think it is worthwhile to do one of the following: a) Add a way to add facet.method=enum to the Solr partition list facet query. It seems to perform much faster then the default facet method. b) Turn Solr docValues on for the _yz_pn field. This will should make faceting on this field really fast in all cases. This could be added to the default schema.

kesslerm commented 7 years ago

I was able to reproduce the speedup of the faceted query for yz_solr:partition_list/1. Switching to facet.method=enum yields consistently faster results.

Additional optimisations include not asking for actual query results (which may generate substantial amounts of unused data internally for big documents) and not returning query headers with the result. Every little helps, as they say.