apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.37k stars 940 forks source link

DirectoryIndexReader finalize() holding TermInfosReader longer than necessary [LUCENE-1715] #2789

Closed asfimport closed 14 years ago

asfimport commented 14 years ago

DirectoryIndexReader has a finalize method, which causes the JDK to keep a reference to the object until it can be finalized. SegmentReader and MultiSegmentReader are subclasses that contain references to, potentially, hundreds of megabytes of cached data in a TermInfosReader.

Some options would be removing finalize() from DirectoryIndexReader (it releases a write lock at the moment) or possibly nulling out references in various close() and doClose() methods throughout the class hierarchy so that the finalizable object doesn't references the Term arrays.

Original mailing list message: http://mail-archives.apache.org/mod_mbox/lucene-java-user/200906.mbox/%3C7A5CB4A7BBCE0C40B81C5145C326C31301A62971@NUMEVP06.na.imtn.com%3E


Migrated from LUCENE-1715 by Brian Groose, resolved Jun 25 2009 Environment:

Sun JDK 6 update 12 64-bit, Debian Lenny

Attachments: LUCENE-1715.patch

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I'd be inclined to remove the finalizer at this point. Apps should not be relying on GC to release the write lock, and if they are, at best they get very buggy behavior (eg we don't flush changes to disk, other writers will sometimes to acquire the write lock because GC didn't finalize, etc.),

If we remove the finalizer then the closed SegmentReader should be GC'd promptly.

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

+1 I suggest removing finalize() from IndexWriter too.

If someone forgets to close IR/IW - that's his personal problem. (Which, in the abscence of finalizer he'll going to notice and fix pretty soon)

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I suggest removing finalize() from IndexWriter too.

+1

At best you get buggy behavior if somehow you're relying on these finalizers...

I plan to commit shortly.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks Brian!

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I would still additionally null the references Brian notes for even faster release by GC?

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Does that really help GC? (I thought not)

Had we left finalize in place, I agree nulling would help, because stuff can sit in the finalize queue for a while and nulling would have severed it. But now since all of DirectoryReader will be GCd "at once", I didn't think nulling would help?

Though, I suppose what it would help is when someone closes the reader but then still hangs onto it, so perhaps we should do it?

asfimport commented 14 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

That was my intention. :)

asfimport commented 14 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

Does that really help GC? (I thought not)

It really depends on the VM implementation. In a dalvik VM people start nulling stuff all the time because it helps GC. In that case where a lot of memory can be collected I guess its worth to null the references. I generally do not null references but in this case I would really to it.

so +1 from my side.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Reopening for nulling at least termInfosReader; we should probably null other memory intensive things like deleted docs & norms.

asfimport commented 14 years ago

Robert Newson (@rnewson) (migrated from JIRA)

I wonder if it's also worth examining the (very few) other places that have finalize()?

The mere presence of a finalize() method triggers different handling by the garbage collector. Since all remaining finalize() methods appear to close resources that should have been closed explicitly, the same principle applies for those as for the resolution of LUCENE-1715?

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

SimpleFSDirectory.FSIndexInput's finalize() protects you from running out of descriptors, if you fail to close your reader; NativeFSLockFactory protects you if you forget to release the lock (close your writer).

I agree, I think we should remove both of these.

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

I object nulling references in attempt to speed up GC. It's totally useless on any decent JVM implementation and if someone uses indecent JVM, I doubt he's concerned with his app efficiency.

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

And support removing finalizers everywhere if their only point is to guard against forgotten close().

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I agree nulling is not a good practice to make GC faster.

But... for freeing up memory even if the app still holds a reference to the reader after closing it, I think this is in fact worthwhile.

asfimport commented 14 years ago

Simon Willnauer (@s1monw) (migrated from JIRA)

I is def. not good practice and I agree that a decent VM should not care. In some environments you don't have a choice (mobile phones for instance) and if selected pieces of "nulling" code can speed things up we should do it. I will run a benchmark on a dalivk VM (Android) to show the difference with the change. I might not have time today or tomorrow though. This change is not visible to anybody using lucene so to me its not that much of a deal. To be honest I'm not a fan of doing that at all but in this case it "could" be useful in some corner cases but does not harm anybody.

asfimport commented 14 years ago

Earwin Burrfoot (migrated from JIRA)

There's in fact one case where nulling harms. I'm going to try making as much of IR as possible immutable and final. Load everything upfront on creation/reopen (or don't load if IR is created for, say, merging). Unlike nulling references, making frequently accessed fields final does have an impact under adequate JVMs.

Well, nulling can be added now and removed when/if I finish my IR stuff.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

OK, attached patch nulling just a few things... I plan to commit in a day or two.

asfimport commented 14 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks Brian!