apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.63k stars 1.02k forks source link

FieldCacheRewriteMethod.java is in tests [LUCENE-4003] #5076

Open asfimport opened 12 years ago

asfimport commented 12 years ago

I believe FieldCacheRewriteMethod was accidentally moved to the wrong place in r1158697 and should be src/main and not in src/test.

Or is it something that you don't want people to use?


Migrated from LUCENE-4003 by selckin

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

this was added mainly to have a test for doctermsindex's termenum, so i think src/test is the correct place.

asfimport commented 12 years ago

selckin (migrated from JIRA)

I found it because i was looking for a query equivalent to FieldCacheRangeFilter

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Thats a nice point... why do we have fieldcacherangefilter and fieldcachetermsfilter, that only work with certain queries?!

This is a more general version, that works with any multitermquery...

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I like this idea, too.

I would like to move this rewrite method (a little bit refactored, using FixedBitSet instead OpenBitSet) to core, factor out the inner Filter impl to be something like MTQWrapperFilter (maybe include the code in MTQWrapperFilter and only set a boolean in ctor). The rewrite mode would then be similar to FilterRewrite, just with another boolean.

We can remove FieldCacheRangeFilter.newStringRange() [or at least rewrite it to use this rewrite method], which would be elegant, too. FieldCacheTermsFilter and also the TermsFilter in contrib can go away and we should instead use the sorted terms automaton (I always forget the name, Dawid, sorry) as a new AtomatonQuery subclass.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Feel free to take this one Uwe!

FieldCacheTermsFilter and also the TermsFilter in contrib can go away and we should instead use the sorted terms automaton (I always forget the name, Dawid, sorry) as a new AtomatonQuery subclass.

That one should maybe be separated out... i could help some. Looking at the description of the thing, i think in some situations it could be a speedup (imagine some of the terms dont exist at all, etc).

Also i wonder about the API, it could still have the current add() api, and do the sorting->automaton at the end (e.g. rewrite), but imo thats wasteful since the automaton is independent of the reader, so instead maybe it could take Term[] up front.

Finally, a new expert ctor (maybe just protected) should be exposed to AutomatonQuery: currently you have:

  public AutomatonQuery(final Term term, Automaton automaton) {
    super(term.field());
    this.term = term;
    this.automaton = automaton;
    this.compiled = new CompiledAutomaton(automaton);
  }

But I would add AutomatonQuery(Term, Automaton, CompiledAutomaton), so that a subclass could pass an already-compiled automaton. This TermsFilter-query would use the alternative CompiledAutomaton ctor, passing true for finite, and false for simplify, since we know its finite and probably large (so simplification in general would hurt).