apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.67k stars 1.03k forks source link

OutOfMemoryError should not be wrapped in an IllegalStateException, as it is misleading for fault-tolerant programs [LUCENE-2511] #3585

Open asfimport opened 14 years ago

asfimport commented 14 years ago

I have a program, which does explicit commits. On one occasion, I saw the following exception thrown:

java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:4061) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:4136) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:4114)

In our program, we treat all errors as fatal and terminate the program (and restart). Runtime exceptions are sometimes handled differently, since they are usually indicative of a programming bug that might be recoverable. in some situations.

I think the OutOfMemoryError should not be wrapped as a runtime exception.. as this can mask a serious issue from a fault-tolerant application.


Migrated from LUCENE-2511 by David Sitsky, updated Nov 30 2013

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Lucene refuses to commit after OOME has been hit for defensive reasons – that is a dangerous exception and it can possibly corrupt IndexWriter's buffered state (even though we try hard to prevent that).

Also, your app would have already hit an OOME (eg while adding a document), so that should already have tripped your "terminate program & restart" logic.

What exception should we throw instead? I suppose we could throw a new OOME, though that's sort of a lie... IllegalStateException is a good match in that the state of IW has become possibly corrupted by OOME tearing through the stack at one point.

asfimport commented 14 years ago

David Sitsky (migrated from JIRA)

No question this is the right behaviour to not commit when there is an OOME. My issue is we should not be catching the OOME in Lucene. We should let it bubble out to the application so the right action can be done.

This is probably leading to a philosophical discussion, but I believe any instance of "catch Throwable" (or catch Error) should be removed from the code, as its potentially dangerous to do-so, especially for a library like Lucene.. unless we catch it and rethrow it.

If this OOM happened on another thread, then I think we should just re-throw it, or wrap it in another OOM, otherwise it will be confusing to users of this library. IllegalStateException I personally think means quite a different thing.. the program is in some state that should have not occurred (usually due to a programming bug or due to some invalid input), where-as OOM is an unrecoverable error.

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

As Mike said, your indexing code should have got an OOM and then it knows there is something wrong and shutdown. To not corrupt the index, we need to detect OOMs. The OOM is rethrowed after marking IW as "bad", so all is fine, we don't change the exception flow, we just plug into the exception bubble-up. The Exception you are talking about is an exception coming later after the OOM was already thrown, to indicate, that you used IW incorrect (calling commit after OOM).

The OOM causes IndexWriter to get into an illegal state, so the exception is correct. You are informed about this illegal state when you call commit(). As you should have got an OOM before, you know, that any further usage of IW is an error, so the ISE is correct. This is just to prevent incorrect code that calls commit() after an OOM was catched and corrupt the on-disk index.

asfimport commented 14 years ago

David Sitsky (migrated from JIRA)

Uwe, that all sounds good, except I never received the original OOM, otherwise our application would have terminated. I am not sure if this matters, but I am using Lucene 1.4.1, only have a single thread doing Lucene index writer work, and explicitly call commit() at certain intervals (co-ordinate commits with ActiveMQ), It was on this commit call that I got this IAE that wrapped the OOM, but I never saw the original OOM..

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

You may look into this issue: #2268

I dont think that the OOM cannot be seen. Maybe 2.4.x had a bug about that, but I dont exspect this. You should update to at least 2.9.3 and try again, 2.4 is quite old.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Did you mean Lucene 2.4.1?

If you never saw the OOME then that's really a bug. We try hard to throw the original OOME... and then record in IW that an OOME was hit so that we don't try to commit.

Note that we do not wrap/suppress an OOME during commit; rather, we check IW to see if any ops against it had hit OOME, and then throw the ISE. If you really hit an OOME during commit, Lucene should throw that back to you too.

What ops is your single thread invoking on IW? We can scrutinize those methods to see if somewhere they may suppress OOME, but we try not to do that.

asfimport commented 14 years ago

David Sitsky (migrated from JIRA)

Thanks - we'll update to 2.9.3 for our latest version. Hopefully all will be well then.

asfimport commented 10 years ago

Erick Erickson (@ErickErickson) (migrated from JIRA)

2013 Old JIRA cleanup