apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.68k stars 1.03k forks source link

Move & rename the terms dict, index, abstract postings out of oal.index.codecs.standard [LUCENE-2647] #3721

Closed asfimport closed 14 years ago

asfimport commented 14 years ago

The terms dict components that current live under Standard codec (oal.index.codecs.standard.*) are in fact very generic, and in no way particular to the Standard codec. Already we have many other codecs (sep, fixed int block, var int block, pulsing, appending) that re-use the terms dict writer/reader components.

So I'd like to move these out into oal.index.codecs, and rename them:

With this move we have a nice reusable terms dict impl. The terms index impl is still well-decoupled so eg we could [in theory] explore a variable gap terms index.

Many codecs, I expect, don't need/want to implement their own terms dict....

There are no code/index format changes here, besides the renaming & fixing all imports/usages of the renamed class.


Migrated from LUCENE-2647 by Michael McCandless (@mikemccand), resolved Sep 19 2010 Attachments: LUCENE-2647.patch (versions: 2)

asfimport commented 14 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

Mike, I think renaming is a good idea - that might make things slightly easier for folks to play around with codec

here are some comments on the naming:

bq.StandardTermsDictWriter/Reader -> PrefixCodedTermsWriter/Reader +1

StandardTermsIndexWriter/Reader -> AbstractTermsIndexWriter/Reader

What about TermsIndexWriter/ReaderBase since we started using that scheme with analyzers and the JDK uses that too. If we remove the abstractness one day the name is very miss-leading but the property of being a base class will likely remain.

SimpleStandardTermsIndexWriter/Reader -> SimpleTermsIndexWriter/Reader

I really don't like Simple* its like Smart which makes me immediately feel itchy all over the place. What differentiates this from others? It is the default? maybe DefaultTermsIndexWriter/Reader?

StandardPostingsWriter/Reader -> AbstractPostingsWriter/Reader

Again, what about PostingWriter/ReaderBase

StandardPostingsWriterImpl/ReaderImple -> StandardPostingsWriter/Reader

+1

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

What about TermsIndexWriter/ReaderBase since we started using that scheme with analyzers and the JDK uses that too.

OK I'll switch from Abstract* -> *Base.

SimpleStandardTermsIndexWriter/Reader -> SimpleTermsIndexWriter/Reader

I really don't like Simple* its like Smart which makes me immediately feel itchy all over the place.

Heh OK.

What differentiates this from others? It is the default? maybe DefaultTermsIndexWriter/Reader?

Well... there are no "others" yet! So, its is the default for now, but, I don't like baking that into its name...

Lesse... so this one uses packed ints, to write the "RAM image" required at search time, so that at search time we just slurp in these pre-built images. While the index term selection policy is now "fixed" (every N), I think this may change with time (the policy should be easily separable from how the index terms are written). Though, since we haven't yet done that separation, maybe I simply name it FixedGapTermsIndexWriter/Reader? How's that?

asfimport commented 14 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

...FixedGapTermsIndexWriter/Reader? How's that?

+1

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

New patch, w/ the names we iterated to above...

asfimport commented 14 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

New patch, w/ the names we iterated to above...

I looked at the patch briefly - looks good to me mike!