apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.6k stars 1.01k forks source link

Specialize 1D dimensional values intersection [LUCENE-6890] #7948

Open asfimport opened 8 years ago

asfimport commented 8 years ago

I tried implementing the same specialization we had before #7939 for the 1D case, but after testing it, I don't think it's worth it.

I'll upload the patch here for posterity (tests pass), but net/net it adds non-trivial code complexity in exchange for minor (5.39 sec -> 5.25 sec for 225 queries) query gains. Maybe in the future someone could improve this so it's more compelling... but I don't think the tradeoff is worth it today.

Furthermore, the optimization 1) requires an API change, and 2) is not even admissible in the current patch, since the query could be a union of multiple disjoint ranges when the optimization assumes it's just a single range.

The gist of the idea is to locate the start leaf block and end leaf block, make an informed estimate of the expected result set size, and then do a linear scan of the leaf blocks, vs the recursion and "grow per leaf block" we do today. I think the conclusion is that this used to be more sizable win, but DocIdSetBuilder has improved so that it is plenty fast without "upfront" growing, which is nice :)

Or maybe my benchmark is bogus ;)

I'll commit the minor code comment / TODOs / test improvements from the patch ...


Migrated from LUCENE-6890 by Michael McCandless (@mikemccand), updated Nov 16 2015 Attachments: LUCENE-6890.patch

asfimport commented 8 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1713684 from @mikemccand in branch 'dev/trunk' https://svn.apache.org/r1713684

LUCENE-6890: update TODOs; remove dead code from test case; improve javadocs

asfimport commented 8 years ago

Daniel Mitterdorfer (migrated from JIRA)

Hi @mikemccand

you've mentioned that you wanted to uploaded the patch anyway but it seems to be missing. Can you please upload it? Thanks!

Daniel

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

you've mentioned that you wanted to uploaded the patch anyway but it seems to be missing.

Gak, you're right! Thanks for noticing, I would have rm'd this checkout soon!

asfimport commented 8 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch, that adds too much code complexity, is buggy (because the opto isn't safe for the union-of-disjoint-parts), and too little query performance gains.

asfimport commented 8 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

DocIdSetBuilder has improved so that it is plenty fast without "upfront" growing, which is nice

Great to hear!