IndexReader.reopen() [LUCENE-743]

asfimport commented 17 years ago

This is Robert Engels' implementation of IndexReader.reopen() functionality, as a set of 3 new classes (this was easier for him to implement, but should probably be folded into the core, if this looks good).

Migrated from LUCENE-743 by Otis Gospodnetic (@otisg), 3 votes, resolved Nov 17 2007 Attachments: IndexReaderUtils.java, lucene-743.patch (versions: 3), lucene-743-take10.patch, lucene-743-take2.patch, lucene-743-take3.patch, lucene-743-take4.patch, lucene-743-take5.patch, lucene-743-take6.patch, lucene-743-take7.patch, lucene-743-take8.patch, lucene-743-take9.patch, MyMultiReader.java, MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch Linked issues:

2106
- 1906
- 2062

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

I'm assuming in your example you meant for reader2 and reader3 to also be SegmentReaders?

Yes that's what I meant. Sorry, I didn't make that clear.

Also in your example let's insert missing "reader1.close()" as the very first close? (Else it will never be closed because it's RC never hits 0).

Doesn't what you describe change the semantics of MultiReader.close()?

If you do:

IndexReader reader1 = IndexReader.open(index1);  
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
multiReader1.close();

then today multiReader1.close() also closes reader1. That's why I consciously omitted reader1.close().

Consequently, if you do

IndexReader reader1 = IndexReader.open(index1);  
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
IndexReader multiReader2 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index3)});
multiReader1.close();

then multiReader2 is not usable anymore, because multiReader1.close() closes reader1. But that can be explicitly avoided by the user because it's known that multiReader1 and multiReader2 share the same reader.

Now, with the reopen() code:

IndexReader reader1 = IndexReader.open(index1);  // optimized index, reader1 is a SegmentReader
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader.open(index2)});
... // modify index2
IndexReader multiReader2 = multiReader1.reopen();  
// only index2 changed, so multiReader2 uses reader1 and has to increment the refcount of reader1

The user gets a new reader instance from multiReader.reopen(), but can't tell which of the subreaders has been reopened and which are shared. That's why multiReader1.close() should not close reader1 in this case and we need refcounting in order to make this work.

So do you suggest that a MultiReader should increment the refcounts when it is opened as well (in the constructor)? I believe we can implement it like this, but as I said it changes the semantics of MultiReader.close() (and ParallelReader.close() is, I believe, the same). A user would then have to close subreaders manually.

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

If you do:
IndexReader reader1 = IndexReader\.open(index1);  
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader\.open(index2)});
multiReader1\.close();
then today multiReader1.close() also closes reader1. That's why I consciously omitted reader1.close().

Ahh, I missed that MultiReader is allowed to close all readers that were passed into it, when it is closed. OK, let's leave reader1.close() out of the example.

It's somewhat "aggressive" of MultiReader/ParallelReader to do that? If you go and use those same sub-readers in other MultiReaders then they closing of the first MultiReader will then break the other ones?

I think we are forced to keep this semantics, for backwards compatibility. But I don't really think MultiReader/ParallelReader should actually be this aggressive. Maybe in the future we can add ctors for MultiReader/ParallelReader that accept a "doClose" boolean to turn this off.

Anyway, it's simple to preserve this semantics with reference counting. It just means that IndexReader / MultiReader do not incref the readers they receive, and, when they are done with those readers, they must call their close(), not decref. Ie they "borrow the reference" that was passed in, rather than incref'ing their own reference, to the child readers.

With that change, plus the change below, your example works fine.

Consequently, if you do
IndexReader reader1 = IndexReader\.open(index1);  
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader\.open(index2)});
IndexReader multiReader2 = new MultiReader(new IndexReader[] {reader1, IndexReader\.open(index3)});
multiReader1\.close();
then multiReader2 is not usable anymore, because multiReader1.close() closes reader1. But that can be explicitly avoided by the user because it's known that multiReader1 and multiReader2 share the same reader.

This is why I don't like the semantics we have today – I don't think it's right that the multiReader1.close() breaks multiReader2.

Now, with the reopen() code:
IndexReader reader1 = IndexReader\.open(index1);  // optimized index, reader1 is a SegmentReader
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, IndexReader\.open(index2)});
\.\.\. // modify index2
IndexReader multiReader2 = multiReader1\.reopen();  
// only index2 changed, so multiReader2 uses reader1 and has to increment the refcount of reader1
The user gets a new reader instance from multiReader.reopen(), but can't tell which of the subreaders has been reopened and which are shared. That's why multiReader1.close() should not close reader1 in this case and we need refcounting in order to make this work.

Both of these cases are easy to fix with reference counting: we just have to change ensureOpen() to assert that RC > 0 instead of closed==false. Ie, a reader may still be used as long as its RC is still non-zero.

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I think we are forced to keep this semantics, for backwards compatibility. But I don't really think MultiReader/ParallelReader should actually be this aggressive. Maybe in the future we can add ctors for MultiReader/ParallelReader that accept a "doClose" boolean to turn this off.

Actually I retract this: it's no longer necessary as long as we change ensureOpen to assert that RC > 0 instead of closed==false.

I think this is actually a nice unexpected side-effect of using reference counting: it resolves this overly aggressive behavior of MultiReader/ParallelReader.

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

With that change, plus the change below, your example works fine.

Two things:

MultiReader/ParallelReader must not incref the subreaders on open() as you said. But on reopen() it must incref the subreaders that haven't changed and thus are shared with the old MultiReader/ ParallelReader. This further means, that the re-opened MultiReader/ ParallelReader must remember which of the subreaders to decref on close(), right?
If we change ensureOpen() like you suggest, then the user might still be able to use reader1 (in my example), even after reader1.close() was explicitly called. Probably not a big deal?

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

MultiReader/ParallelReader must not incref the subreaders on open() as you said. But on reopen() it must incref the subreaders that haven't changed and thus are shared with the old MultiReader/ ParallelReader. This further means, that the re-opened MultiReader/ ParallelReader must remember which of the subreaders to decref on close(), right?

Hmm, right. MultiReader/ParallelReader must keep track of whether it should call decref() or close() on each of its child readers, when it itself is closed.

If we change ensureOpen() like you suggest, then the user might still be able to use reader1 (in my example), even after reader1.close() was explicitly called. Probably not a big deal?

I think this is OK?

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

I think this is OK?

This was essentially the reason why I suggested to use two refcount values: one to control when to close a reader, and one to control when to close it's (shared) resources in case of SegmentReader. That approach would not alter the behaviour of IndexReader.close(). But I agree that your approach is simpler and I also think it is okay to change ensureOpen() and accept the slight API change.

So I'll go ahead and implement the refcount approach unless anybody objects.

Oh and Mike, thanks for bearing with me :-)

asfimport commented 17 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

What about a new constructor for MultiReader/ParallelReader that implements more sensible semantics (increment refcount on readers passed to it, and decrement on close())?

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

What about a new constructor for MultiReader/ParallelReader that implements more sensible semantics (increment refcount on readers passed to it, and decrement on close())?

Yeah, when reference counting is implemented then such a constructor should be easy to add.

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Oh and Mike, thanks for bearing with me :-)

Thank you for bearing with me!

What about a new constructor for MultiReader/ParallelReader that implements more sensible semantics (increment refcount on readers passed to it, and decrement on close())?

+1

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

Ok here is the next one :-)...

This patch implements the refCounting as discussed with Mike and Yonik above.

Other changes/improvements/comments:

ensureOpen() is now also called in MultiReader.reopen() and ParallelReader.reopen(). (thanks, Mike)
in case an exception occurs during reopen() it is taken care of closing or decreasing the refCount of already created readers. Also old readers should not be affected in case an exception occurs.
I improved how norms are re-opened in a MultiSegmentReader (MSR). It now checks which parts of the normsCache haven't changed and copies those to the new normsCache. Because I'm imagining Yonik with his thread-safety hat on now ;), another comment about this: In case a MSR is refreshed, then the synchronized MSR.reopen() method has the lock on the old MSR. This method creates the new MSR and the values from the old cache are copied to the new cache in the constructor, so while the lock on the old MSR is still being held.
added new constructors to MultiReader and ParallelReader that increase the refCount on the subReaders and thus prevent closing the subReaders on close(). (thanks, Yonik)

I also made the changes suggested by Hoss (thanks!):

changed the "successfully reopened" comments int the javadocs
added comments to the javadocs saying that write operations on the re-opened reader will result in undefined behavior unless the old reader is closed
FilterIndexReader.reopen() not implemented, i. e. will throw an UnsupportedOperationException.

All unit tests pass.

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch looks great! I'm still working through it but found a few small issues...

It might be good to put a "assert refCount > 0" at various places like decRef(), incRef(), ensureOpen()? That would require changing the constructors to init refCount=1 rather than incRef() it to 1.

I'm seeing a failure in contrib/memory testcase:

    [junit] *********** FILE=./NOTICE.txt
    [junit] Fatal error at query=Apache, file=./NOTICE.txt, anal=org.apache.lucene.analysis.SimpleAnalyzer@341960
    [junit] ------------- ---------------- ---------------
    [junit] Testcase: testMany(org.apache.lucene.index.memory.MemoryIndexTest): Caused an ERROR
    [junit] this IndexReader is closed
    [junit] org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed
    [junit]     at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:158)
    [junit]     at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:632)
    [junit]     at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:64)
    [junit]     at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:143)
    [junit]     at org.apache.lucene.search.Searcher.search(Searcher.java:118)
    [junit]     at org.apache.lucene.search.Searcher.search(Searcher.java:97)
    [junit]     at org.apache.lucene.index.memory.MemoryIndexTest.query(MemoryIndexTest.java:412)
    [junit]     at org.apache.lucene.index.memory.MemoryIndexTest.run(MemoryIndexTest.java:313)
    [junit]     at org.apache.lucene.index.memory.MemoryIndexTest.testMany(MemoryIndexTest.java:234)

I think it's because MemoryIndexReader (private class in MemoryIndex.java) calls super(null) = IndexReader.IndexReader(Directory) in its constructor, which does not initialize the refCount to 1? If I insert incRef() into IndexReader.IndexReader(Directory) constructor, the test passes, but who else is using that constructor (ie will this double-incref in those cases?).

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK I think this patch is very close! I finished reviewing it – here's some more feedback:

In multiple places you catch an IOException and undo the attempted re-open, but shouldn't this be a try/finally instead so you also clean up on hitting any unchecked exceptions?
I think you need an explicit refCount for the Norm class in SegmentReader. . Say I've done a chain of 10 re-opens for SegmentReader and each time only the segment's norms has changed. I've closed all but the last SegmentReader. At this point all 10 SegmentReaders are still alive (RC > 0) and holding open all file handles for their copies of the norms. So this will leak file handles/RAM with each reopen? . To fix this, I think you just need to add refCount into Norm class & set refCount to 1 in the constructor. Then, each each SegmentReader calls Norm.decRef(), not Norm.close(), when it's done. When refCount hits 0 then the Norm closes itself. Finally, during re-open you should share a Norm instance (rather than open a new one) if it had not changed from the previous SegmentReader. . For singleNormStream, I think each reopened SegmentReader should always re-open this descriptor and then we can forcefully close this stream when the SegmentReader is closed (what you are doing now). Ie the SegmentReader fully owns singleNormStream.
If you have a long series of reopens, then, all SegmentReaders in the chain will remain alive. So this is a [small] memory leak with time. I think if you changed referencedSegmentReader to always be the starting SegmentReader then this chain is broken and after 10 reopens only the original SegmentReader and the most recent one will remain alive (assuming I closed all SegmentReaders but the most recent one).

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

Patch looks great! I'm still working through it but found a few small issues...

Thanks Mike! Very good review and feedback!

It might be good to put a "assert refCount > 0" at various places like...

Agreed. I added those asserts to incRef() and decRef(). I didn't add it to ensureOpen(), because it throws an AlreadyClosedException anyway, and some testcases check if this exception is thrown.

I'm seeing a failure in contrib/memory testcase:

Oups, I fixed this already. I changed the (deprecated) ctr IndexReader.IndexReader(Directory) to call this() which sets the refCount to 1. The test passes then. I made this fix yesterday, I think I just forgot to update the patch file before I submitted it, sorry about this.

In multiple places you catch an IOException and undo the attempted re-open, but shouldn't this be a try/finally instead so you also clean up on hitting any unchecked exceptions?

Yes of course! Changed it.

I think you need an explicit refCount for the Norm class in SegmentReader.

OK I see. I made this change as well. I also made the change that there is no chain, but one starting SegmentReader which all re-opened ones reference (see below). Now this starting SegmentReader won't close its norms until all other readers that reference it are closed (RC=0), because only then doClose() is called, which calls closeNorms(). Do you see an easy way how to improve this? Hmm, probably I have to definalize IndexReader.incRef() and decRef() and overwrite them in SegmentReader. Then SegmentReader.incRef() would also incRef the norms, SegmentReader.decref() would decref the norms, and somehow a clone that shares references the reader but not the norms (because they changed) would only incref the reader itself, but not the norms... Or do you see an easier way?

If you have a long series of reopens, then, all SegmentReaders in the chain will remain alive. So this is a [small] memory leak with time. I think if you changed referencedSegmentReader to always be the starting SegmentReader then this chain is broken

Good point. Ok I changed this and also the test cases that check the refCount values.

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Looks great! All tests pass for me.

OK I see. I made this change as well. I also made the change that there is no chain, but one starting SegmentReader which all re-opened ones reference (see below). Now this starting SegmentReader won't close its norms until all other readers that reference it are closed (RC=0), because only then doClose() is called, which calls closeNorms(). Do you see an easy way how to improve this?

How about if SegmentReader.close() always calls Norm.decRef(), immediately, for each Norm is has open? EG you could implement doCloseUnsharedResources in SegmentReader and do it there). This way, if the SegmentReader has been closed but it shares resources (and not the Norms) with reopened SegmentReaders then its Norms would all decRef to 0 & be closed.

Also make sure that if a SegmentReader is decRef'd to 0 and close was never called, that also in this case you remember to call Norm.decRef for all open norms.

One more comment: I think you can partially share Norm instances? Eg if I have 2 fields that have norms, but only one of them changed since I opened this SegmentReader, then the reopened SegmentReader could share the Norm instance of the field that didn't change with the old SegmentReader? But right now you're re-loading all the Norms.

Otherwise no more comments!

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

How about if SegmentReader.close() always calls Norm.decRef(), immediately, for each Norm is has open? EG you could implement doCloseUnsharedResources in SegmentReader and do it there). This way,

Hmm I was thinking about this before (that's actually why I put that method in there). But I don't think this is gonna work. For example, let's say we use a MultiReader that has two SegmentReader SR1 and SR2. Now only SR2 changed, we reopen the MR which increases the refCount on SR1, because it shares that SR. Now we close the old MultiReader, which calls close() on SR1. If now SegmentReader.close() calls Norm.decRef(), then it will close the norms even though they are still used by the new MultiReader.

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

One more comment: I think you can partially share Norm instances? Eg

Good idea! Will make the change.

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Hmm I was thinking about this before (that's actually why I put that method in there). But I don't think this is gonna work. For example, let's say we use a MultiReader that has two SegmentReader SR1 and SR2. Now only SR2 changed, we reopen the MR which increases the refCount on SR1, because it shares that SR. Now we close the old MultiReader, which calls close() on SR1. If now SegmentReader.close() calls Norm.decRef(), then it will close the norms even though they are still used by the new MultiReader.

Ugh, you're right. The challenge is sometimes a reference to SR means "I will use norms" (this is when MultiReader incRefs) but other times it means "I will not use norms" (this is when SR incRefs due to reopen).

OK, I like your original proposal: SR overrides incRef() and incrs its RC as well as the RC for each Norm it's using. Then, in SR's reopenSegment, you carefully incRef the "original" SR without incRef'ing its Norms (except for those Norms you will keep).

Likewise, SR overrides decRef() to decr its RC and RC for each Norm. But, when a reopened SR1.doClose() is called, you must carefully decRef the RD of the original SR but not decRef each of its Norms (except for those you had actually shared).

This way when MR calls SR.incRef/decRef then all Norms and the SR's RC are incr'd/decr'd. But when SR1 shares resources with an original SR it only incr's/decr's the refCount of the SR.

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

OK, I think it's finally working now! :-)

SegmentReader now overwrites incRef() and increments the readers RC, as well as the RCs of all norms. I further added the private method incRefReaderNotNorms() to SegmentReader, which is called in reopenSegment(), because it takes care of incrementing the RCs of all shared norms.

I also added the method doCloseUnsharedResources() to IndexReader, which is a NOOP by default. It is called when a reader is closed, even if its RC > 0. SegmentReader overwrites this method and closes (=decRef) the norms in it. The SegmentReader then remembers that it closed the norms already and won't close them again in doClose(), which is called when its RC finally drops to 0.

I also made the change you suggested, Mike, to only reload the field norms that actually changed. SegmentReader.openNorms() now checks if there is already a norm for a field in the HashTable, and only loads it if it's not there. reopenSegment() puts all norms in the new SegmentReader that haven't changed.

I added some new tests to verify the norms ref counting. All unit tests pass now. So I think this is ready to commit, but I'd feel more comfortable if you could review it again before I commit it.

asfimport commented 17 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

I just did a quick partial review of SegmentReader for thread safety only and ran across some potential issues

It looks like fieldsReader is shared between clones(), and that isn't thread-safe (synchronization is done at the SegmentReader level, and now there is more than one)
maybe the same issue with deletedDocs? mutual exclusion is no longer enforced.
it looks like the norms Hashtable could be shared... looping over the individual norms and calling incRef doesn't seem safe for a number of reasons (for example, you might miss some just being added)
reading new norms isn't safe... synchronized norms(String field, byte[] bytes, int offset) uses the "norm' IndexInput that is shared. synchronization on a single reader no longer guarantees mutual exclusion.

There's probably other stuff, but I stopped looking. Since we are sharing things now, every method that was synchronized is now potentially unsafe. Synchronizing on the object being shared is probably a much better strategy now.

This is complex enough that in addition to review, I think we need a good multi-threaded test - 100 or 1000 threads over a ram directory, all changing, querying, retrieving docs, reopening, closing, etc.

asfimport commented 17 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

It also looks like Norm.incRef is used in an unsafe manner (unsynchronized, or synchronized on the reader), and also Norm.decRef() is called inside a synchronized(norms) block, but an individual Norm may be shared across multiple Hashtables, right?

I don't think that norms even needs to be a synchronized Hashtable... it could be changed to a HashMap since it's contents never change, right?

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK, reviewed the latest patch:

In this code:
```
// singleNormFile means multiple norms share this file
if (fileName.endsWith("." + IndexFileNames.NORMS_EXTENSION)) {
clone.singleNormStream = d.openInput(fileName, readBufferSize);            
}
```
I think the comment should be removed (it doens't apply) and also won't this incorrectly open the singleNormStream more than once if more than one field does not have separate norms? I think you should init that to null and then only reopen it, once, if it's still null?
In MultiSegmentReader, the logic that copies over unchanged norms from the old norms cache can be simplified. I think you can just look up the old Norm instance & the new Norm instance and if they are == then you can copy bytes over? This would also let you remove "sharedNorms" entirely which is good because it's not a just a boolean thing anymore since some Norm instances are shared and some aren't.
I think you also need to override decRef (and add decRefReaderNotNorms) to SegmentReader? Because now there is a mismatch: incRef incr's the Norm RC's, but, decRef does not. So I think norms are not getting closed? I think we should modify the "assertReaderClosed()" in the unit test to verify (when appropriate) that also the RC of all Norm instances is also 0 (ie assertTrue(SR.normsClosed())). Then, make sure SR calls referencedSegmentReader.decRefReaderNotNorms instead of decRef. I think you then don't need to track "closedNorms" boolean, at all. You simply always decRef the norms whenever SR.decRef is called. The doCloseUnsharedResources is still needed to close the singleNormStream.

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

This is complex enough that in addition to review, I think we need a good multi-threaded test - 100 or 1000 threads over a ram directory, all changing, querying, retrieving docs, reopening, closing, etc.

+1

We should fix all the synchronization issues you've found, create this unit test, and then iterate from there.

asfimport commented 17 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

We should fix all the synchronization issues you've found, create this unit test, and then iterate from there.

Or reverse it... write the test first so we have confidence that it can at least uncover some of these issues. The test should do as little synchronization as possible of it's own so it doesn't mask a lack of synchronization in the core. It should be possible to uncover the unsynchronized concurrent use of IndexInput at least, and hopefully some of the refcounting issues too.

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Or reverse it... write the test first so we have confidence that it can at least uncover some of these issues. The test should do as little synchronization as possible of it's own so it doesn't mask a lack of synchronization in the core. It should be possible to uncover the unsynchronized concurrent use of IndexInput at least, and hopefully some of the refcounting issues too.

Excellent, I agree!

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

I just did a quick partial review of SegmentReader for thread safety only and ran across some potential issues

OK, let's scratch my "ready to commit" comment ;)

A question about thread-safety here. I agree that we must fix all possible problems concerning two or more IndexReaders in read-mode, like the FieldsReader issue.

On the other hand: We're saying that performing write operations on a re-opened reader results in undefined behavior. Some of the issues you mentioned, Yonik, should only apply in case one of the shared readers is used to perform index modifications, right? Then the question is: how much sense does it make to make reopen() thread-safe in the write case then?

So I think the multi-threaded testcase should not perform index modifications using readers involved in a reopen()?

asfimport commented 17 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

Sorry, I hadn't kept up with this issue wrt what was going to be legal (and we should definitely only test what will be legal in the MT test). So that removes the deletedDocs issue I guess.

asfimport commented 17 years ago

Thomas Peuss (migrated from JIRA)

To find concurrency issues with an unit test is hard to do, because your potential problems lie in the time domain and not in the code domain. ;-)

From my experience following things can have impact on the results of such a test:

Running on SP or SMP machines. SMP machines (the more cores the better) reveal concurrency issues much earlier.
The Java implementation you are using. IBM's and Sun's thread implementations behave slightly different for example.
The OS you are running. This may seem odd in the first run but remember that modern Java implementations rely heavily on the threading implementations of the OS.
The processor platform you are running. NUMA vs. UMA (which is AMD vs. intel). The timing of threads can differ because of this.

And be prepared that one time your tests runs through without a problem and on the next run it breaks...

Just my € 0.02

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

Changes in this patch:

Fixed ParallelReader and MultiReader so that they don't incRef the subreaders anymore in case reopen() is a NOOP (i. e. reopen() doesn't return a new instance)
In the new ctr in MultiSegmentReader it was possible to hit a NullPointerException during filling the norms cache, because I didn't check for null after retrieving the old reader from the HashMap. I fixed this.
SegmentReader now also overwrites decRef() and implements decRefReaderNotNorms().
As Mike suggested I removed "boolean sharedNorms" from SegmentReader. Now in MultiSegmentReader I compare the norm instances from the old and the new subReaders and copy the bytes to the new cache in case they are ==.
In SegmentReader I changed norms to be a HashMap instead of HashTable.
Norm.decRef() and Norm.incRef() are synchronized now.
SegmentReader#norms(String field, byte[] bytes, int offset) now synchronizes on the norm object that is to be read.
SegmentReader#reopen() now opens a new FieldsReader because it is not thread-safe.
SegmentReader.Norm has a new boolean variable "useSingleNormStream". SegmentReader#norms(String field, byte[] bytes, int offset) checks if it is true. If yes, then the readers' singleNormStream is used, otherwise norm.in. This is necessary so that a reopened SegmentReader always uses its own singleNormStream and to avoid synchronization on the singleNormStream.
I added a bunch of code to TestIndexReaderReopen to test the thread-safety of reopen(). It starts 150 threads: some modify the index (some delete docs, some add docs and some set norms), some reopen readers and check if the re-opened ones deliver the same results as fresh ones.
assertReaderClosed now checks if the norms are closed and also checks recursively if all subReaders are closed.

Still outstanding:

On the IBM JVM all tests pass. On Sun, the thread-safety test sometimes fails. When it fails, then in assertReaderClosed, because the refCounts of either the norms or some subReaders aren't 0 at the end of the test. At this point I'm not sure why and I'm still debugging. I just wanted to submit the patch to give others the chance to review the patch or possibly (hopefully) find the problem before me.

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

Changes:

Updated patch to current trunk (I just realized that the latest didn't apply cleanly anymore)
MultiSegmentReader now decRefs the subReaders correctly in case an exception is thrown during reopen()
Small changes in TestIndexReaderReopen.java

The thread-safety test still sometimes fails. The weird thing is that the test verifies that the re-opened readers always return correct results. The only problem is that the refCount value is not always 0 at the end of the test. I'm starting to think that the testcase itself has a problem? Maybe someone else can take a look

it's probably something really obvious but I'm already starting to feel dizzy while pondering about thread-safety.

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I think the cause of the intermittant failure in the test is a missing try/finally in doReopen to properly close/decRef everything on exception.

Because of lockless commits, a commit could be in-process while you are re-opening, in which case you could hit an IOexception and you must therefore decRef those norms you had incRef'd (and, close eg the newly opened FieldsReader).

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

> I think the cause of the intermittant failure in the test is a missing > try/finally in doReopen to properly close/decRef everything on > exception.

Awesome! Thanks so much for pointing me there, Mike! I was getting a little suicidal here already ... ;)

I should have read the comment in SegmentReader#initialize more carefully:

    } finally {

      // With lock-less commits, it's entirely possible (and
      // fine) to hit a FileNotFound exception above.  In
      // this case, we want to explicitly close any subset
      // of things that were opened so that we don't have to
      // wait for a GC to do so.
      if (!success) {
        doClose();
      }
    }

While debugging, it's easy to miss such an exception, because SegmentInfos.FindSegmentsFile#run() ignores it. But it's good that it logs such an exception, I just have to remember to print out the infoStream next time.

So it seems that this was indeed the cause for the failing test case. I made the change and so far the tests didn't fail anymore (ran it about 10 times so far). I'll run it another few times on a different JVM and submit an updated patch in a short while if it doesn't fail again.

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

OK, all tests pass now, including the thread-safety test. I ran it several times on different JVMs.

Changes:

As Mike suggested I added a try ... finally block to SegmentReader#reopenSegment() which cleans up after an exception was hit.
Added some additional comments.
Minor improvements to TestIndexReaderReopen

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Awesome! Thanks so much for pointing me there, Mike! I was getting a little suicidal here already ...

No problem, I lost some hairs tracking that one down too!!

OK, latest patch looks good! I love the new threaded unit test.

Only two smallish comments:

You should also close fieldsReader when referencedSegmentReader != null, right? (in SegmentReader.doClose)
In the new try/finally in reopenSegment: if you first setup referencedSegmentReader, then can't that finally clause just be clone.decRef() instead of duplicating code for decRef'ing norms, closeNorms(), etc.?

asfimport commented 17 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

So how about a public IndexReader.flush() method so that one could also reopen readers that were used for changes?

Usecase:

reader.deleteDocument() reader.flush() writer = new IndexWriter() writer.addDocument() writer.close() reader.reopen() reader.deleteDocument()

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

You should also close fieldsReader when referencedSegmentReader != null, right? (in SegmentReader.doClose)

Yes, will do!

In the new try/finally in reopenSegment: if you first setup referencedSegmentReader, then can't that finally clause just be clone.decRef() instead of duplicating code for decRef'ing norms, closeNorms(), etc.?

Hmm, what if then in clone.close() an exception is thrown from FieldsReader.close() or singleNormStream.close(). In that case it would not decRef the referenced reader.

Hmm but actually we could change the order in close() so that referencedSegmentReader.decRefReaderNotNorms() is done first even if the following close() operations don't succeed?

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

So how about a public IndexReader.flush() method

Since our goal is it to make IndexReader read-only in the future (#2106), do you really think we need to add this?

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Hmm but actually we could change the order in close() so that referencedSegmentReader.decRefReaderNotNorms() is done first even if the following close() operations don't succeed?

+1

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

So how about a public IndexReader.flush() method

I think also if we do decide to do this we should open a new issue for it?

asfimport commented 17 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

> Since our goal is it to make IndexReader read-only in the future > (#2106), do you really think we need to add this?

flush() would make reopen() useful in more cases, and #2106 is further off (not Lucene 2.3, right?) Anyway, flush() would be considered a write operation like setNorm() & deleteDocument() and could be deprecated along with them in the future if that's how we decide to go.

> I think also if we do decide to do this we should open a new issue for it?

Yes, that's fine.

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

I think also if we do decide to do this we should open a new issue for it?

+1

I'll open a new issue.

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

Changes:

Close FieldsReader in SegmentReader#doClose() even if referencedReader!=null
Call clone.decRef() in the finally clause of SegmentReader#reopenSegment()
decRef referencedReader before closing other resources in SegmentReader#doClose()
Removed IndexReader#doCloseUnsharedResources().

asfimport commented 17 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Patch looks good. Only thing I found was this leftover System.out.println, in SegmentReader.java:

  System.out.println("refCount " + getRefCount());

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

Thanks for the review, Mike! I'll remove the println.

Ok, I think this patch has been reviewed a bunch of times and should be ready to commit now. I'll wait another day and commit it then if nobody objects.

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

Changes:

Updated to current trunk.
Removed println in SegmentReader.

I'm going to commit this soon!

asfimport commented 17 years ago

Michael Busch (migrated from JIRA)

Committed! Phew!!!

apache / lucene

IndexReader.reopen() [LUCENE-743] #1818

2106

1906

2062