apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.59k stars 1.01k forks source link

fix most memory-hungry tests [LUCENE-9241] #10281

Open asfimport opened 4 years ago

asfimport commented 4 years ago

Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 4GB which is pretty crazy.

On the other hand, if we fix a few edge cases, tests can work with lower heaps such as 128M. This can save many gigabytes (also it finds interesting memory waste/issues).


Migrated from LUCENE-9241 by Robert Muir (@rmuir), updated Mar 06 2020 Attachments: LUCENE-9241.patch

asfimport commented 4 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Attached patch. @dweiss I didn't yet modify the gradle build, I figured lets just clean up the memory hungry tests first. It is almost possible to run with 64MB heap with the patch, but we'd need to use OfflineSorter for the kuromoji/nori dictionary builds, which is more involved.

asfimport commented 4 years ago

Robert Muir (@rmuir) (migrated from JIRA)

There are a few "real" code changes here to review:

asfimport commented 4 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Nightlies would still require a larger heap (because of increased iteration counts)?

There is a runtime difference to this:

-    InputStream is = TestJapaneseTokenizer.class.getResourceAsStream("userdict.txt");
-    if (is == null) {
+    URL resource = TestJapaneseTokenizer.class.getResource("userdict.txt");
+    if (resource == null) {
       throw new RuntimeException("Cannot find userdict.txt in test classpath!");
     }

the URL based version causes jar to be locked on Windows (if I recall right). I don't see the benefit of switching to URL here?

If there are tests that really require large amount of ram we could create a group for these and then create a separate test run for these... Or assume the nightlies have a bumped heap amount?

asfimport commented 4 years ago

Robert Muir (@rmuir) (migrated from JIRA)

the URL based version causes jar to be locked on Windows (if I recall right). I don't see the benefit of switching to URL here?

Not true.

the existing getResourceAsStream is simply getResource + openStream. this way the exc handling is simpler.

Look here: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ClassLoader.java#L1723-L1731

(Sorry, I have to call such things out, lest we have shitty code based on rumors or other wrong reasons)

asfimport commented 4 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Not true... anymore. Because I definitely struggled with this at some point of time (java 8?) and there used to be a difference. Thanks for clarifying though.

asfimport commented 4 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

It is actually (and sadly) still true. You're looking at parent class but getResourceAsStream is overriden in URLClassLoader; the behavioral difference is still in there, here:

https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/net/URLClassLoader.java#L291-L305

asfimport commented 4 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I remember now. When you're using a dynamic class loader (URLClassLoader or its subclass) then resources opened on the URL directly will lock the jar. When you use getResourcesAsStream it registers the jar as closeable (as in the code above) and closing the class loader releases the lock on the file as well.

asfimport commented 4 years ago

Robert Muir (@rmuir) (migrated from JIRA)

This isn't a URLClassloader here. The standard one is not URLClassLoader anymore.

asfimport commented 4 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I'm really indifferent about it - I was just pointing out the fact that such pattern (opening an url, not the stream) was (and is) a problem sometimes.

Which classloader is going to load tests is often beyond our control; windows is typically the evil here – it has limited subprocess command argument line so gradle may (and I think this is coming in the next version) try to avoid the problem by forking a launcher which loads JARs in a separate classloader (arguments from file) rather than using system classpath option.

asfimport commented 4 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Dawid, if you think Class.getResource has some crazy behavior like this on windows, then I think we should really open a bug with the JDK. If it is such a problem, shouldnt existing usages be removed, and it added to forbidden APIs, until the bug is fixed?

https://github.com/apache/lucene-solr/search?q=getResource%28&unscoped_q=getResource%28

I merely tried to simplify the tests... that is really all this patch is about.

asfimport commented 4 years ago

Bruno Roustant (@bruno-roustant) (migrated from JIRA)

I reviewed the patch. Sounds good to me.

I was wondering if the change in ByteRunAutomaton could impact perf. I don't think so, but I'm going to run a couple of benchmarks.

Similarly in #10285 I'll change AutomatonTermsEnum to reduce memory in case of large automaton.

asfimport commented 4 years ago

Bruno Roustant (@bruno-roustant) (migrated from JIRA)

As expected I saw no noticeable impact in the luceneutil benchmarks.

asfimport commented 4 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I have reviewed it as well. :) Except for the things I mentioned I didn't think anything else was worth mentioning. Direct memory allocation may be misleading in that it is still allocation but escapes the heap... but I don't have an opinion on that (whether it's a good thing or not) so I'll just leave it up to you.

asfimport commented 4 years ago

Robert Muir (@rmuir) (migrated from JIRA)

@dweiss I saw a recent URLclassloader windows leak thread on the jdk list and it reminded me of this issue.

I'll remove the use of getResource (please keep in mind there are many of these elsewhere in the codebase if you are actually concerned about this).

Instead, if the user screws up here in their test, they'll get a NullPointerException and they can follow the stack trace. Soon the default NPE from the JDK will actually be more helpful than such custom messages like this anyway.

asfimport commented 4 years ago

ASF subversion and git services (migrated from JIRA)

Commit 9cfdf17b2895866877668002d443277a46cd04e8 in lucene-solr's branch refs/heads/master from Robert Muir https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9cfdf17

LUCENE-9241: fix tests to pass with -Xmx128m

asfimport commented 4 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I wasn't really that much concerned; just pointing out the (sad) fact of how it's implemented for Windows.