basho / yokozuna

Riak + Solr
245 stars 76 forks source link

Build & Cache 'fq' Params in Query Plan [JIRA: RIAK-1691] #392

Open rzezeski opened 10 years ago

rzezeski commented 10 years ago

Build and cache the fq params in the query plan rather than converting the filter pairs every time.

DSomogyi commented 9 years ago

Comment for Jira.

ghost commented 9 years ago

Just as an aside to the coverage plan, and perf related issues.. We've actually had to disable caching on solr and/or/ remove the coverage plan from the fq. Due to high ingest rates, we seemed to never be able to hit a cache lookup, which was essentially forcing us to do a : query/cache save for every targetted search.

I'd be interested in any work down on improving search performance on large buckets, with high volume indexes. Also, we've been recently testing kv get/put latencies and their relationship to searches done through the yk interface vs direct to solr. Seems like we're paying a pretty heavy price for going through the yk layer at this point... We'll be updating another issue once we're able to generate some more reliable numbers.

zeeshanlakhani commented 9 years ago

@boardom def. interested in benchmarks you've done. It's interesting that you're not hitting any cache lookups on the Solr side. I'm guessing you've messed w/ the solr config as well as per https://wiki.apache.org/solr/SolrCaching?

We're starting on the path of perf improvements: https://github.com/basho/yokozuna/pull/483 & https://github.com/basho/yokozuna/pull/478 thus far (and an improvement for CRDT DataType-related operations as well in 2.1).

More to come for sure.

ghost commented 9 years ago

Just another data point, this is the same query done directly on a Solr node, with the vnode filters done as a query filter and included in the query. timestamp is a SolrTrieLongField with about 80 mil records in the cluster. If YZ had an option to include the filter as an additional AND statement instead of the current query filter, it would solve the problem for our use case at least.

{ "responseHeader":{ "status":0, "QTime":958, "params":{ "q":"timestamp:[1429579919010 TO 1429579921010]", "indent":"true", "fq":"_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10", "rows":"0", "wt":"json"}}, "response":{"numFound":80,"start":0,"docs":[] }}

{ "responseHeader":{ "status":0, "QTime":1, "params":{ "q":"timestamp:[1429579919010 TO 1429579921010] AND (_yz_pn:55 OR _yz_pn:40 OR _yz_pn:25 OR _yz_pn:10)", "indent":"true", "rows":"0", "wt":"json"}}, "response":{"numFound":80,"start":0,"docs":[] }}