apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.49k stars 988 forks source link

[PATCH] TermInfosReader, SegmentTermEnum Out Of Memory Exception [LUCENE-436] #1514

Closed asfimport closed 17 years ago

asfimport commented 18 years ago

We've been experiencing terrible memory problems on our production search server, running lucene (1.4.3).

Our live app regularly opens new indexes and, in doing so, releases old IndexReaders for garbage collection.

But...there appears to be a memory leak in org.apache.lucene.index.TermInfosReader.java. Under certain conditions (possibly related to JVM version, although I've personally observed it under both linux JVM 1.4.2_06, and 1.5.0_03, and SUNOS JVM 1.4.1) the ThreadLocal member variable, "enumerators" doesn't get garbage-collected when the TermInfosReader object is gc-ed.

Looking at the code in TermInfosReader.java, there's no reason why it shouldn't be gc-ed, so I can only presume (and I've seen this suggested elsewhere) that there could be a bug in the garbage collector of some JVMs.

I've seen this problem briefly discussed; in particular at the following URL: http://java2.5341.com/msg/85821.html The patch that Doug recommended, which is included in lucene-1.4.3 doesn't work in our particular circumstances. Doug's patch only clears the ThreadLocal variable for the thread running the finalizer (my knowledge of java breaks down here - I'm not sure which thread actually runs the finalizer). In our situation, the TermInfosReader is (potentially) used by more than one thread, meaning that Doug's patch doesn't allow the affected JVMs to correctly collect garbage.

So...I've devised a simple patch which, from my observations on linux JVMs 1.4.2_06, and 1.5.0_03, fixes this problem.

Kieran PS Thanks to daniel naber for pointing me to jira/lucene

@@ -19,6 +19,7 @@ import java.io.IOException;

import org.apache.lucene.store.Directory; +import java.util.Hashtable;

/** This stores a monotonically increasing set of <Term, TermInfo> pairs in a

@@ -60,10 +61,10 @@ }

private SegmentTermEnum getEnum() {

TermInfosReader.java, full source:

package org.apache.lucene.index;

/**

import java.io.IOException;

import org.apache.lucene.store.Directory; import java.util.Hashtable;

/** This stores a monotonically increasing set of <Term, TermInfo> pairs in a

final class TermInfosReader { private Directory directory; private String segment; private FieldInfos fieldInfos;

private final Hashtable enumeratorsByThread = new Hashtable(); private SegmentTermEnum origEnum; private long size;

TermInfosReader(Directory dir, String seg, FieldInfos fis) throws IOException { directory = dir; segment = seg; fieldInfos = fis;

origEnum = new SegmentTermEnum(directory.openFile(segment + ".tis"),
                               fieldInfos, false);
size = origEnum.size;
readIndex();

}

public int getSkipInterval() { return origEnum.skipInterval; }

final void close() throws IOException { if (origEnum != null) origEnum.close(); }

/** Returns the number of term/value pairs in the set. */ final long size() { return size; }

private SegmentTermEnum getEnum() { SegmentTermEnum termEnum = (SegmentTermEnum)enumeratorsByThread.get(Thread.currentThread()); if (termEnum == null) { termEnum = terms(); enumeratorsByThread.put(Thread.currentThread(), termEnum); } return termEnum; }

Term[] indexTerms = null; TermInfo[] indexInfos; long[] indexPointers;

private final void readIndex() throws IOException { SegmentTermEnum indexEnum = new SegmentTermEnum(directory.openFile(segment + ".tii"), fieldInfos, true); try { int indexSize = (int)indexEnum.size;

  indexTerms = new Term[indexSize];
  indexInfos = new TermInfo[indexSize];
  indexPointers = new long[indexSize];

  for (int i = 0; indexEnum.next(); i++) {
indexTerms[i] = indexEnum.term();
indexInfos[i] = indexEnum.termInfo();
indexPointers[i] = indexEnum.indexPointer;
  }
} finally {
  indexEnum.close();
}

}

/** Returns the offset of the greatest index entry which is less than or equal to term.*/ private final int getIndexOffset(Term term) throws IOException { int lo = 0; // binary search indexTerms[] int hi = indexTerms.length - 1;

while (hi >= lo) {
  int mid = (lo + hi) >> 1;
  int delta = term.compareTo(indexTerms[mid]);
  if (delta &lt;0)
hi = mid - 1;
  else if (delta &gt; 0)
lo = mid + 1;
  else
return mid;
}
return hi;

}

private final void seekEnum(int indexOffset) throws IOException { getEnum().seek(indexPointers[indexOffset], (indexOffset * getEnum().indexInterval) - 1, indexTerms[indexOffset], indexInfos[indexOffset]); }

/** Returns the TermInfo for a Term in the set, or null. */ TermInfo get(Term term) throws IOException { if (size == 0) return null;

// optimize sequential access: first try scanning cached enum w/o seeking
SegmentTermEnum enumerator = getEnum();
if (enumerator.term() != null                 // term is at or past current
&& ((enumerator.prev != null && term.compareTo(enumerator.prev) > 0)
    || term.compareTo(enumerator.term()) >= 0)) {
  int enumOffset = (int)(enumerator.position/enumerator.indexInterval)+1;
  if (indexTerms.length == enumOffset     // but before end of block
  || term.compareTo(indexTerms[enumOffset]) &lt;0)
return scanEnum(term);            // no need to seek
}

// random-access: must seek
seekEnum(getIndexOffset(term));
return scanEnum(term);

}

/* Scans within block for matching term. / private final TermInfo scanEnum(Term term) throws IOException { SegmentTermEnum enumerator = getEnum(); while (term.compareTo(enumerator.term()) > 0 && enumerator.next()) {} if (enumerator.term() != null && term.compareTo(enumerator.term()) == 0) return enumerator.termInfo(); else return null; }

/** Returns the nth term in the set. */ final Term get(int position) throws IOException { if (size == 0) return null;

SegmentTermEnum enumerator = getEnum();
if (enumerator ![= null && enumerator.term() ](= null && enumerator.term() )= null &&
    position >= enumerator.position &&
position &lt;(enumerator.position + enumerator.indexInterval))
  return scanEnum(position);          // can avoid seek

seekEnum(position / enumerator.indexInterval); // must seek
return scanEnum(position);

}

private final Term scanEnum(int position) throws IOException { SegmentTermEnum enumerator = getEnum(); while(enumerator.position < position) if (!enumerator.next()) return null;

return enumerator.term();

}

/* Returns the position of a Term in the set or -1. / final long getPosition(Term term) throws IOException { if (size == 0) return -1;

int indexOffset = getIndexOffset(term);
seekEnum(indexOffset);

SegmentTermEnum enumerator = getEnum();
while(term.compareTo(enumerator.term()) &gt; 0 && enumerator.next()) {}

if (term.compareTo(enumerator.term()) == 0)
  return enumerator.position;
else
  return -1;

}

/** Returns an enumeration of all the Terms and TermInfos in the set. */ public SegmentTermEnum terms() { return (SegmentTermEnum)origEnum.clone(); }

/** Returns an enumeration of terms starting at or after the named term. */ public SegmentTermEnum terms(Term term) throws IOException { get(term); return (SegmentTermEnum)getEnum().clone(); }

/* some jvms might have trouble gc-ing enumeratorsByThread */ protected void finalize() throws Throwable { try { // make sure gc can clear up. enumeratorsByThread.clear(); } finally { super.finalize(); } } }


Migrated from LUCENE-436 by kieran, 4 votes, resolved Dec 20 2006 Environment:

Solaris JVM 1.4.1
Linux JVM 1.4.2/1.5.0
Windows not tested

Attachments: FixedThreadLocal.java, lucene-1.9.1.patch, LUCENE-436.patch, Lucene-436-TestCase.tar.gz, TermInfosReader.java, ThreadLocalTest.java

asfimport commented 18 years ago

Fernando Martins (migrated from JIRA)

I'm also having memory leaking when doing simple Searches using lucene CVS checkout from 2006-01-10 I'm using jrockit-jdk1.5.0_03.

I've tried with this patch, and my simple search test STOPPED leaking. Maybe it does depend on the JVM version. I'll try on our application this patch to see if it stops leaking.

asfimport commented 18 years ago

Nicholaus Shupe (migrated from JIRA)

I will be trying this patch on Lucene 1.9.1 to see if it fixes my production memory leak with Tomcat 5.5.15 / Redhat / Java 1.5.0_06.

asfimport commented 18 years ago

robert engels (migrated from JIRA)

The bug is invalid, as was pointed out on the lucene-dev list.

When the thread dies, all thread locals will be cleaned-up.

If the thread does not die, the thread-locals are still cleaned-up PERIODICALLY. And the value is always cleaned up - since it is held in a WeakReference, the only things that "hangs around" is the key - which consumes VERYL LITTLE memory.

The submitter should be able to provide a testcase that demonstrate the ThreadLocal bug. The one provided with bug 529 is invalid (as was pointed out on the dev email thread).

IMHO, there is NO BUG HERE. The users are ding something else that is causing a OOM.

This bug should be closed as invalid.

asfimport commented 18 years ago

robert engels (migrated from JIRA)

The finalize() method should be removed from the Segment/Term readers.

In most cases this will actually make things worse, as the objects will need to stay around until the finalizer thread can run.

asfimport commented 18 years ago

kieran (migrated from JIRA)

I have created a test that exhibits the above-discussed problem. Please note, I don't think it should necessarily be called a "Lucene" bug, as the lucene source is written in a way that /should/ work. It's more likely to be a JVM bug, I think. Please note, I have seen this bug in SUN Hotspot JVM 1.4.2 on Linux. I will shortly be trying some other JVMs and platforms, but - for the time being - this test should only be run against such an environment. While developing this test, it also became apparent that the bug only exhibits itself when using a RAMDirectory as the source for an IndexSearcher. I will shortly attach the test to this issue.

asfimport commented 18 years ago

kieran (migrated from JIRA)

test case for recreating this issue on (at least) sun jvm hotspot 1.4.2 on linux

asfimport commented 18 years ago

Nicholaus Shupe (migrated from JIRA)

I agree with robert engels about the finalize method being taken out of the class. In fact, the finalize method is run by the finalizer thread, and by setting the value to null in that thread, it effectively creates one more entry in the Thread threadLocals map. This can only hurt the class and performance. I'm still investigating to why the memory leak occurs though.

asfimport commented 18 years ago

robert engels (migrated from JIRA)

fwiw, we have done EXTENSIVE memory profiling of the low-level Lucene (and JVM) routines.

It is my opinion that there are no memory leaks in either.

In every case that we "found" a memory leak, it always ended up tied back to something we were doing wrong on our client code.

asfimport commented 18 years ago

Nicholaus Shupe (migrated from JIRA)

robert engels is incorrect in that there isn't a memory leak with Lucene. The test case made by kieran demonstrates the memory leak quite well on my machine for the following setup:

Lucene 1.9.1 Sun JDK 1.5.0_06 JVM Args: -Xmx4M (low memory to get the unit test to fail faster) Windows XP Pro

The test case will fail on my machine with i = 2006 and useRamDirectory set to true (as indicated by kieran, this point is important).

Furthermore, by changing the enumerators declaration to:

private Map enumerators = Collections.synchronizedMap(new WeakHashMap());

and the getEnum method to:

private SegmentTermEnum getEnum() { Thread currentThread = Thread.currentThread(); SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get(currentThread); if (termEnum == null) { termEnum = terms(); enumerators.put(currentThread, termEnum); } return termEnum; }

(note that I'm not suggesting that my changes be used for the official patch)

The problem dissapears, and the unit test completes. I vote that this bug be bumped to critical in priority and a fix be included for the Lucene 1.9.2 release and perhaps a patch be made for the 1.4.3 release for those who need it.

For further reading on the perils of ThreadLocal I suggest the following reading:

http://www.me.umn.edu/\~shivane/blogs/cafefeed/2004/06/of-non-static-threadlocals-and-memory.html http://www.szegedi.org/articles/memleak.html http://crazybob.org/2006/02/threadlocal-memory-leak.html http://opensource.atlassian.com/confluence/spring/pages/viewpage.action?pageId=2669

asfimport commented 18 years ago

robert engels (migrated from JIRA)

I agree that the test case does fail, so I was wrong. There is "something" broken in Lucene or the JDK.

What is the JDK bug that was fixed that allows it to work under 1.5? The proposed patch is nearly exactly the implementation of ThreadLocal, so I am not sure how it would fix the problem?

When I ran it under the profiler, it seems to work correctly. I still it may have something to do with finalizers preventing GC when it needs it.

asfimport commented 18 years ago

Nicholaus Shupe (migrated from JIRA)

The problem is apparent on my machine with JDK 1.5, so basically this bug shows up, regardless of JDK version.

I've looked at the source for ThreadLocal, and there's a custom implementation of a map being used which is a bit scarry. However, you'd figure they tested the hell out of it, including instance count verification with the garbage collector.

If we assume the code for ThreadLocal is correct, then the only idea I have is that the value, a SegmentTermEnum has a reference chain back to the ThreadLocal in someway, which would prevent the ThreadLocal key in the inner map from being collected. However, I can't seem to show that being true. I've even tried not using the .clone method thinking there might be something fishy with that, but nothing came out of that.

asfimport commented 18 years ago

Nicholaus Shupe (migrated from JIRA)

This type of ThreadLocal (anti?)pattern also seems to be present in SegmentReader.

asfimport commented 18 years ago

robert engels (migrated from JIRA)

It works fine for me under 1.5.0_06.

I have determined the problem with the ThreadLocal code in 1.4.2. The entries in the ThreadLocalMap extend WeakReference (holding a weak ref to the key which is the ThreadLocal), but also contain a HARD reference to the value (the thread local value - which in this case can be quite large).

In JDK 1.4.2, the entries are only cleaned-up if a hit occurs and the ThreadLocalMap determines that the entry is stale (the key has been reclaimed), and at this point the entry is reused for the new ThreadLocal. If no hits ever occur (or it keeps hiiting the same entry), the number of entries in the table will become unbounded.

In JDK 1.5, the ThreadLocal code periodically clean-ups SOME stale ThreadLocals during each invocation - using cleanSomeSlots().

This is why using the proposed patch works - it uses a standard WeakHashMap which expunges ALL stale entries on nearly every operation.

I will look into creating a derived ThreadLocal that works properly. Or we can just package the proposed patch into a FixedThreadLocal. I don't know what the Lucene dev policies are on "fixing" JDK problems. Seems to me a FixedThreadLocal is the best solution.

asfimport commented 18 years ago

robert engels (migrated from JIRA)

ThreadLocal replacement that avoids memory leak

asfimport commented 18 years ago

robert engels (migrated from JIRA)

I posted FixedThreadLocal.java. Changed Lucene to use this class instead of ThreadLocal. Test case runs fine under 1.4.

The performance difference appears almost non existent. I think this is because in typical Lucene usage the number of invocations is quite small.

A better multi-threaded version is also possible, using a WeakHashMap for each thread, containing the FixedThreadLocals for that thread. This would minimize the length of the synchronization required.

In either case, the finalize methods should be removed. This contributes greatly to the problems with the current code. In fact, it does nothing, since if the object was finalizable, then its reference would have already been reclaimed from the ThreadLocalMap, so having the finalize method actually slows this process. (I removed the finalized methods from Lucene - test case runs fine).

BUT, this problem is not as severe as it appears.

The main reason that this becomes an issue with JDK 1.4 and the testcase, is that the size of the objects being put into the ThreadLocal, since the RAMDirectory has a complete copy of the index. In most uses this will not be the case, and the objects will be reclaimed eventually, as the ThreadLocalMap size stabalizes. The bug in ThreadLocal allows a certain number of stale entries, once the index size gets to 300k, a max of 30 entries can be stale without running out of memory.

asfimport commented 18 years ago

robert engels (migrated from JIRA)

demonstrate thread local memory leak

asfimport commented 18 years ago

robert engels (migrated from JIRA)

I posted ThreadLocalTest.java that clearly demonstrates the "problem" with JDK 1.4.

BUT, it also demonstrates the problem with the test case.

If you run the test with -Xmx4m it will fail quickly under JDK 1.4.2. If you run the test as is under JDK 1.5 with the same memory constraints it will run forever.

With the same memory constraints, if you change the test to use FixedThreadLocal it will run forever.

BUT, if you run the test with -Xmx64m under JDK 1.4.2 it will run forever! This is because the ThreadLocalMap size stablizes.

Thus, unless you are creating and accessing large RAMDirectories you will never experience the problem.

I think this bug should be closed as invalid. BUT, the finalizer() methods should be removed from the Lucene code since it does no good, and actually slows the system by delaying the GC.

Hopefully this ends the issue, and we can move on to other things.

asfimport commented 18 years ago

robert engels (migrated from JIRA)

Just for my own "pat on the back", it goes back to what I said in the earliest comment.

There is NOT a memory leak in Lucene or JDK 1.4.2 or Lucene.

JDK 1.4.2 made a performance vs. memory size decision that works in most cases.

In the submitters case their deisgn decision does not work so well (combination of size of ThreadLocal values and maximum heap size).

With JDK 1.5, they chose an alternative implementation that is SLOWER, but the memory requirements in the general case are smaller, and deterministic.

There will ALWAYS be design decisions like this in ALL software development.

asfimport commented 18 years ago

Andy Hind (migrated from JIRA)

I agree this is not strictly an issue with lucene....but .....

Lucene has an unusual use pattern for thread locals (instance vs static member variables). There are issues with ThreadLocal, discussed here and elsewhere, including potential instability. You should expect others to use thread locals - maybe in the same way.

I fixed the issue reported in #1607 with memory sensitive caching using SoftReferences to the values held as ThreadLocals. Before an out of memory error, the values are cleared and hard refs to the soft reference wrapper class remain. This pattern is used by some classes in the JVM. This limits the memory overhead with thread local use. There will always be some overhead.

I am happy with the alternative fix using WeakHashMap Note: stale entries are always scanned and removed (each call to put, get, size) in contrast to thread locals. This is what I want - it must be stable!

I spent as much time as I could trying to come up with a clear, simple test case that applied regardless of memory constraints. A clear failure case must be possible, but I did not have time to investigate the criteria for ThreadLocal instability. In any case, I would send this to Sun.

A test case is one thing, knowning, understanding and fixing/working around an issue is another. In all the simple cases I tried I got stability but with higher memory use and gc activity than with the "fixed" version. However, I did also remove the pointless finalize() method, which could very well explain the growth of the thread local table.

We have had problems with 1.5.0_06. The issue is caused by the pattern of thread local use and garbage collection producing instability in the size of the thread local table. Your single test case does not imply that the issue does not exist for other JVMs and use cases. I have had the issue without using RAMDirectory - it seems it is just more likely with it.

By the way, cause and effect is sufficient for me. I have a problem with using the 1.4.3 code, this change fixes it.

Regards

Andy

asfimport commented 18 years ago

robert engels (migrated from JIRA)

To restate:

It is NOT a bug. It is a design decision in the JDK between performance and memory use. This decisions changed a bit in 1.5. 1.5 still does not clear the entire ThreadLocal table - it uses hueristics to clear MOST of the entries - a balance between memory and speed. The WeakHashMap clears ALL entries on every invocation. It is slower than a ThreadLocal, BUT, it's memory use is more deterministic.

Lucene will work FINE in all cases except when loading large directories into a RAMDirectory, and closing and reopening said directories multiple times.

Since this may be a common use case for Lucene, I suggest using the FixedThreadLocal class - trading performance for deterministic memory use.

In all cases, the current finalize() methods that "clear the ThreadLocal value" should be removed. It is incorrect, and causes performance problems. Since the finalize() is for the ThreadLocal value, it will never be finalized until the ThreadLocal has already cleared the entry - which is based on the key (the ThreadLocal), since the ThreadLocal entry maintains a hard ref to the value.

asfimport commented 18 years ago

robert engels (migrated from JIRA)

Oops. Last comment was not quite correct.

The reason the finalize() methods are worthless for clearing the ThreadLocal entries, is that they are clearing the ThreadLocal entry for the "finalize" thread !

So they do nothing but slow down finalization/GC.

asfimport commented 18 years ago

robert engels (migrated from JIRA)

The best solution is this,

move the

enumerators.set(null);

to the TermInfosReader close(), and remove the finalize().

Everything will work fine.

asfimport commented 18 years ago

robert engels (migrated from JIRA)

Actually, the last simple "fix" only works well for single threaded applications, so it is not much of a "fix".

Use FixedthreadLocal - works best.

asfimport commented 18 years ago

Nicholaus Shupe (migrated from JIRA)

However, this problem is classified is eventually irrelevant to me. Bug in the JDK, bug in Lucene, not a bug, it's all the same. What is relevant is that I have a production app using Lucene 1.9.1 that will produce an OutOfMemoryError within 2 weeks of use with a huge 128MB of heap. The configuration is Sun's JDK 1.5.0_06 on Linux / Tomcat 5.5.15. With a patch, I am NOT getting the memory link, and my customers are NOT being denied service because of LUCENE-436.

For what it's worth, the use case for my app is opening up one IndexReader and one IndexSearcher, and using the IndexSearcher in an multi-threaded environment. If this isn't a primary use case for Lucene, it should be.

asfimport commented 18 years ago

robert engels (migrated from JIRA)

Are you always loading your indexes using RAMDirectories?

I use multi-hundred megabytes indexes, with 196 max heap, and it runs for months.

I can guarentee that if your index grows to 100+ megabytes you will get an OOM no matter what you do (loading into a RAMDirectory with a 128 max heap), SO, THE ISSUE IS COMPLETELY DEPENDENT on YOUR configuration.

You may find that using a 256 max heap allows your index to run forever with NO code CHANGES.

I am not saying this issue should not be addressed, but it is not BUG. It is a performance vs. memory consumption design tradeoff. If you knew anything about software, you would understand this.

It may be the common case that Lucene should favor lower memory consumption vs. performance, but it is not a ABSOLUTE.

There are conceivably many users of Lucene where the change does not make sense, the performance degradation is not worth it (because they have plenty of memory available based on index size, and/or do not load the indexes into memory).

asfimport commented 18 years ago

robert engels (migrated from JIRA)

better version of TermInfos reader that avoid multiple ThreadLocal.get() calls

asfimport commented 18 years ago

robert engels (migrated from JIRA)

I attached a better version of TermInfosReader that avoids multiple ThreadLocal.get() during get(Term r) which can be expensive.

This is especially important if ThreadLocal is changed to FixedThreadLocal due to the lower performance of get().

I also removed some unused package level methods.

asfimport commented 18 years ago

Nicholaus Shupe (migrated from JIRA)

I am not using a RAMDirectory. I never indicated as such. Here's the the way I load my index:

reader = IndexReader.open(dictionaryIndexDirectory);

dictionaryIndexDirectory is a java.io.File.

My index is 30 megabytes. This configuration should be perfectly fine in 128 MB of heap, even 64 sounds reasonable. I can't simply add a 128 MB heap to my configuration without additional cost. I can't just throw hardware at the problem. Also, any performance arguments are irrelevant unless proven with real numbers. If you knew anything about software, you would understand this as well.

asfimport commented 18 years ago

robert engels (migrated from JIRA)

Then you have some other problem.

The ONLY way the ThreadLocal issue is an ISSUE is if VERY LARGE objects are referenced by the ThreadLocal. Since your indexes are not being held in ram, there is some other problem. You even provided that test case that DEMONSTRATES the "memory leak" issue is not a problem unless a RAMDirectory is used.

As for what is a valid heap size - this depends completely on the other code loaded into the JVM, and what memory it uses.

I use a max heap of 196 with 400mb index and it works fine.

asfimport commented 18 years ago

kieran (migrated from JIRA)

Robert Engels description of the ThreadLocal issue in JDK 1.4.2 provides a very plausible explanation for why this is not a bug, either in Lucene, or in the JDK.

It's not a memory "leak", in the traditional sense (i.e. an unbounded increase in heap memory usage); however, it can - in certain circumstances - lead to a lot more memory being assigned than is necessary, which (and this is pivotal) isn't necessarily considered for GC-ing once it's no longer (hard-)referenced.

Personally, now this behaviour of ThreadLocal has been pointed out, I find it very difficult to envisage circumstances in which I'll even consider using it - until it is altered. (I won't say never, because, as Robert points out, there IS a performance gain to this behaviour).

My personal experience is that this "memory leak" / "larger than necessary memory footprint" can cause extreme problems in production apps. We've been running out of memory even with 700MB assigned to each JRE. My patch has resolved the issue, and so we will apply it to each future release of Lucene, before we use it. Maybe ours is an unusual setup: we periodically close our IndexSearcher, reload an index from the filesystem into a RAMDirectory, and create a new IndexSearcher from it. These indexes are only a few MB in size.

The reason I posted this patch and test case was to help developers who might be in a similar situation to me, and save them from the having to perform the days of investigation that I did - not because I wanted to force an unwanted change into the Lucene source code. It would be very useful to me if a variant of this patch WAS incorporated into Lucene but I recognize that other users may well have different priorities vis-a-vis performance vs memory ueage.

(btw, Robert, I think you might have mistakenly ascribed the test case to Nicholaus! It was me who suggested that it was dependent on using RAMDirectories.)

Kind regards.

asfimport commented 18 years ago

Erik Hatcher (@erikhatcher) (migrated from JIRA)

Please, everyone, let's keep this discussion technical and factual and avoid making degrading statements to one another. It doesn't help the situation to have such negative tones being used. The discussion aspect of this should be moved to java-dev anyway, and leave JIRA comments for details on patches attached and other technical details related directly towards resolving this issue.

asfimport commented 18 years ago

Fernando Padilla (migrated from JIRA)

Here are our facts. It looks like this bug is also affecting us!!

We are also running into memory issues on our production servers and running out of memory.

Tomcat 5, Lucene1.9.1, JDK 1.5, RHEL, RAMDirectory

We are constantly updating the index ( 2400 documents smoothly reindexed every 30 hour ).

Index size of about 5M.

We are doing ordered searches against the index.

We are constantly creating new IndexReader/Writer/Searcher and closing as needed.

In QA we have Xmx500M, in production we have Xmx1500M.

asfimport commented 18 years ago

Fernando Padilla (migrated from JIRA)

So I went ahead and made our own version of lucene-1.9.1 from the sources along with the proposed patch. And yes it does help our memory leak. If you refer to the previous comment, you can see what our environment looks like. I will go ahead and attach our version of the patch now.

asfimport commented 18 years ago

Fernando Padilla (migrated from JIRA)

alright then. Attached is the patch that I just created. I couldn't resist modifying the FixedThreadLocal to suit my tastes. :)

and it goes against the "1.9.1" src that I downloaded off of their website. (though it was marked as version "1.9.2-dev", which then I re-christened "1.9.2-protrade" ).

And yes it seems to have helped with the memory leak.

thank you all.

asfimport commented 17 years ago

Antony Scerri (migrated from JIRA)

Would it not be easier to simply make the objects stored in the ThreadLocal a WeakReference. So in the case of TermInfosReader store the SegmentTermEnum within the WeakReference, and then place that into the enumerators varaible. This will maintain the cloned term enumerator object for the lifetime of the thread using it, but also allow the objects to be cleaned up by the GC when the index owning the TermInfosReader is no longer referenced (at which point no thread should be using the index).

asfimport commented 17 years ago

Antony Scerri (migrated from JIRA)

Lets forget that last comment I made, I was testing out a simple setup and forgot there wasn't going to be a hard reference to the cloned term enum object anywhere, so the weak reference would get cleared away immediately upon GC. A SoftReference could be used instead, but would be subject to GC if free memory was low, which under heavy load could mean it would again get cleared away by GC before the thread had finished.

asfimport commented 17 years ago

Otis Gospodnetic (@otisg) (migrated from JIRA)

4 months later, I think I see the same problem here. I'm using JDK 1.6 (I saw the same problem under 1.5.0_0(8,9,10)) and Lucene from HEAD (2.1-dev). I'm running out of 2GB heap in under 1 day on a production system that searches tens of thousands of indexes, where a few hundred of them have IndexSearchers open to them at any one time, with unused IndexSearchers getting closed after some period of inactivity.

I'm periodically dumping the heap with jconsole and noticing the continuously increasing number of:

org.apache.lucene.index.TermInfo org.apache.lucene.index.CompoundFileReader$CSIndexInput org.apache.lucene.index.Term org.apache.lucene.index.SegmentTermEnum ...

There was a LOT of back and forth here.

What is the final solution? I see a complete new copy of TermInfosReader, but there are a lot of formatting changes in there, it's hard to tell what was actually changed, even with diff -bB --expand-tabs.

I also see FixedThreadLocal, but I see no references to it from TermInfosReader...?

asfimport commented 17 years ago

robert engels (migrated from JIRA)

I would doubt the ThreadLocal "issue" that was in 1.4, changed in 1.5, would be reintroduced in 1.6.

I do not use Lucene 2.1 so I can't say for certain that a new memory bug hasn't been introduced.

I suggest attaching a good profiler (like JProfiler) and figure our the cause of the memory leak (the root references).

I use 1.9 based Lucene and can say unequivocally there are no inherent memory issues (especially when running under 1.5+).

There may also be new issues introduced in JDK 6 - we have not tested with it, only 1.4 and 1.5.

asfimport commented 17 years ago

Otis Gospodnetic (@otisg) (migrated from JIRA)

My leak ended up being caused by the patch in #1726 and is fixed by #1829.

The only thing left to do in this case is: 1) remove finalize() calls in SegmentReader and TermInfosReader 2) call enumerators.remove() in TermInfosReader's close()

I'll do that in the next few days.

asfimport commented 17 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

The more finalizers we can get rid of, the better. They are too hard to use correctly and cause performance problems. http://devresource.hp.com/drc/resources/jmemmodel/index.jsp

asfimport commented 17 years ago

Otis Gospodnetic (@otisg) (migrated from JIRA)

This patch removes finalize() in TermInfosReader and SegmentReader, and it adds the enumerators.remove() call in TermInfosReader close() method.

asfimport commented 17 years ago

Otis Gospodnetic (@otisg) (migrated from JIRA)

Applied and committed the LUCENE-436.patch (is JIRA smart enough not to hyperlink this?) - all unit tests still pass.