apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.69k stars 1.04k forks source link

pass liveDocs Bits down in scorercontext, instead of Weights pulling from the reader [LUCENE-3474] #4548

Closed asfimport closed 13 years ago

asfimport commented 13 years ago

Spinoff from #2610, this would allow filters to work in a more flexible way (besides just cleaning up)


Migrated from LUCENE-3474 by Robert Muir (@rmuir), resolved Oct 01 2011 Attachments: LUCENE-3474.patch (versions: 4)

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

by the way, this worked well for us already: the compile-break found some sneaky little scorers using their own deletedDocs instead of the acceptDocs.

Ideally before we switch on any filter optimizations, we can hack up AssertingIndexSearcher to randomly use Bits/Filter in different ways to flush out lots of problems in tests.

asfimport commented 13 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

+1 this is much better than anything else! compile time error is good here even if it makes it less comfortable for changes ie. bw break.

asfimport commented 13 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

+1

And it's awesome this already caught places where we were missing the acceptDocs cutover. Bug averted :)

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

I did a quick review:

Otherwise, +1, we should maybe commit this and then spin off an issue for improving AssertingIndexSearcher?

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Thanks Chris, good find.

Otherwise, +1, we should maybe commit this and then spin off an issue for improving AssertingIndexSearcher?

Actually I think this will be more easily done in #2610? e.g. if we add the suggested heuristic there, as a boolean protected expert method, subclasses can override the heuristic if they need... and AssertingIndexSearcher could just return random.nextBoolean() :)

asfimport commented 13 years ago

Chris Male (migrated from JIRA)

Okay great. Lets commit this then.

asfimport commented 13 years ago

Robert Muir (@rmuir) (migrated from JIRA)

This means on any addition to the scorer API (eg I've long wanted for caller to declare up front whether they need scores computed vs "only matching", ie MTQWF and CSQ would pass false), we break the API. But I think that's actually fine, even in 3.x: making your own Scorer is very expert.

Just as an FYI: we already have an issue open for that too: #4404

But I don't think we will see real gains from that with StandardCodec/DocsEnum today?

asfimport commented 13 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

But I don't think we will see real gains from that with StandardCodec/DocsEnum today?

Right, not yet (not until our enum impls are able to [efficiently] separately decode docs and docs+freqs), and so we can wait until then.