apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.63k stars 1.02k forks source link

Some house cleaning in addIndexes* [LUCENE-2455] #3529

Closed asfimport closed 14 years ago

asfimport commented 14 years ago

Today, the use of addIndexes and addIndexesNoOptimize is confusing - especially on when to invoke each. Also, addIndexes calls optimize() in the beginning, but only on the target index. It also includes the following jdoc statement, which from how I understand the code, is wrong: After this completes, the index is optimized. – optimize() is called in the beginning and not in the end.

On the other hand, addIndexesNoOptimize does not call optimize(), and relies on the MergeScheduler and MergePolicy to handle the merges.

After a short discussion about that on the list (Thanks Mike for the clarifications!) I understand that there are really two core differences between the two:

This issue proposes the following:

  1. Clear up the documentation of each, spelling out the pros/cons of calling them clearly in the javadocs.
  2. Rename addIndexesNoOptimize to addIndexes
  3. Remove optimize() call from addIndexes(IndexReader...)
  4. Document that clearly in both, w/ a recommendation to call optimize() before on any of the Directories/Indexes if it's a concern.

That way, we maintain all the flexibility in the API - addIndexes(IndexReader...) allows for using IR extensions, addIndexes(Directory...) is considered more efficient, by allowing the merges to happen concurrently (depending on MS) and also factors in the MP. So unless you have an IR extension, addDirectories is really the one you should be using. And you have the freedom to call optimize() before each if you care about it, or don't if you don't care. Either way, incurring the cost of optimize() is entirely in the user's hands.

BTW, addIndexes(IndexReader...) does not use neither the MergeScheduler nor MergePolicy, but rather call SegmentMerger directly. This might be another place for improvement. I'll look into it, and if it's not too complicated, I may cover it by this issue as well. If you have any hints that can give me a good head start on that, please don't be shy :).


Migrated from LUCENE-2455 by Shai Erera (@shaie), resolved May 27 2010 Attachments: index.31.cfs.zip, index.31.nocfs.zip, LUCENE-2455_3x.patch (versions: 5), LUCENE-2455_trunk.patch

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Ok I added the indexes from trunk (didn't know they were there). I've changed CFS to write a version header in the file, so that's why I've added a 3.0 index - to make sure it can be read properly by 3.1. What I've added to TestBackwardsCompatibility are tests to ensure that addIndexes work on old indexes (which was good, because after the changes they weren't !).

Maybe simple delete, they are not used.

The testAddIndexes were just added, and the 30 indexes are used. So I cannot delete them (see my comment above)

By the way the 3.0 index zip file generation code is in the 3.0 branch, have you edited it there?

Nope, it exists in TestBackwardsCompatibility as commented out, w/ instructions to uncomment. I've used that code.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

While porting the code to trunk, I've noticed that acquireRead/Write, releaseRead/Write, upgradeReadToWrite are either not called anymore, or called in relation to addIndexes. So I think these can be safely removed as well (from 3x and trunk)?

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

So I think these can be safely removed as well (from 3x and trunk)?

I think so!

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Committed revision 948415 (copied the 3.0 indexes from trunk) and removed more unnecessary code from IndexWriter.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Like the 3x patch, only this one changes IndexFileNames.segmentFileName to take another parameter for custom names, as well as update some jdocs to match flex (Codecs). I think this is ready to go in.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Should we not add a 3.1 index (created with HEAD 3.x branch) to the TestBackwardsCompatibility? So we can verify that preflex indexes with new CFS header also work?

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Yes! I'll add them and update the tests. Will post a patch after I get more comments

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Hmm ... I've created the indexes using the 3x branch, copied them to trunk and updated TestBackwardsCompatibility to refer to them. All tests pass except for testNumericFields. It fails on both CFS and non-CFS indexes, and so I'm not sure it's related to this issue at all. The failure is this:

junit.framework.AssertionFailedError: wrong number of hits expected:<1> but was:<0>
    at org.apache.lucene.index.TestBackwardsCompatibility.testNumericFields(TestBackwardsCompatibility.java:773)

Can you try to run it on your checkout?

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

For me it passes.

Are you sure that you used the latest checkout of 3x. I added the index generation code yesterday after your last 3x commit. This code was not merged to 3x from trunk, as it was postflex added. This is done sice yesterday:

Author: uschindler
Date: Wed May 26 13:13:10 2010
New Revision: 948420

URL: http://svn.apache.org/viewvc?rev=948420&view=rev
Log:
Merge the 3.0 index backwards tests from trunk (numeric field support). This makes it consistent across all branches.

Modified:
    lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/index/   (props changed)
    lucene/dev/branches/branch_3x/lucene/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java   (contents, props changed)

I attached the generated ZIP files from my 3x checkout.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Yes - after I updated my checkout and re-create the indexes, the test passes. So I will include them with this patch as well.

asfimport commented 14 years ago

Shai Erera (@shaie) (migrated from JIRA)

Committed revision 948861 (trunk).

asfimport commented 13 years ago

Grant Ingersoll (@gsingers) (migrated from JIRA)

Bulk close for 3.1