apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.73k stars 1.04k forks source link

build mr-jar and use some java 9 methods if available [LUCENE-7966] #9015

Closed asfimport closed 6 years ago

asfimport commented 7 years ago

See background: http://openjdk.java.net/jeps/238

It would be nice to use some of the newer array methods and range checking methods in java 9 for example, without waiting for lucene 10 or something. If we build an MR-jar, we can start migrating our code to use java 9 methods right now, it will use optimized methods from java 9 when thats available, otherwise fall back to java 8 code.

This patch adds:

Objects.checkIndex(int,int)
Objects.checkFromToIndex(int,int,int)
Objects.checkFromIndexSize(int,int,int)
Arrays.mismatch(byte[],int,int,byte[],int,int)
Arrays.compareUnsigned(byte[],int,int,byte[],int,int)
Arrays.equal(byte[],int,int,byte[],int,int)
// did not add char/int/long/short/etc but of course its possible if needed

It sets these up in org.apache.lucene.future as 1-1 mappings to java methods. This way, we can simply directly replace call sites with java 9 methods when java 9 is a minimum. Simple 1-1 mappings mean also that we only have to worry about testing that our java 8 fallback methods work.

I found that many of the current byte array methods today are willy-nilly and very lenient for example, passing invalid offsets at times and relying on compare methods not throwing exceptions, etc. I fixed all the instances in core/codecs but have not looked at the problems with AnalyzingSuggester. Also SimpleText still uses a silly method in ArrayUtil in similar crazy way, have not removed that one yet.


Migrated from LUCENE-7966 by Robert Muir (@rmuir), resolved Feb 08 2018 Attachments: LUCENE-7966.patch (versions: 5), LUCENE-7966-7x-backwards.patch (versions: 2), LUCENE-7966-v2.patch Linked issues:

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Here's my current patch. Need to look into the AnalyzingSuggester abuse of comparator, but other than that I think its ok as a start. See the TODO's in lucene/core/build.xml related to where to go from here.

As far as code I did some cleanup and plugged new stuff in some obvious places, but there is a lot more that could be done.

Not sure if I have the time to see it through, i didnt realize how much of our byte[] stuff alone was a mess...

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Updated patch. I fixed the fallout better (don't encode wasteful bytes for the first term, no bytes need to be written) from detecting out-of-order comparisons in prefix-coding methods, this annoyingly was always related to the empty string: happened with bytesDifference([], [])/sortKeyLength([], []).

To me it makes sense to detect the malfunction (-1 return from mismatch) rather than just leniently doing something strange for duplicate terms, it may detect bugs.

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi Robert, this is wonderful. I am happy that our discussion about this helped. Nice that you used the example on the ANT webpage: https://ant.apache.org/manual/Tasks/jar.html#jep238-example

The problem is that you currently need Java 9 to compile, too. I'd make it optional. My alternative idea would be: Make a stub clone of the Objects/Arrays classes without code from the JDK and use them as bootclasspath. I can try to work with that.

asfimport commented 7 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

  • <!-- TODO: on windows does executable need .exe suffix? -->

Not if you fork=true because then the task is executed via cmd which resolves file extensions automatically.

asfimport commented 7 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Overall, pretty neat and an uncharted territory (I didn't see anybody using this JEP yet!). One thing that we lose here is the verbosity of exception message (field names, doc ids); for example:

-    if (dimension < 0 || dimension >= type.pointDimensionCount()/2) {
-      throw new IllegalArgumentException("dimension request (" + dimension +
-          ") out of bounds for field (name=" + name + " dimensions=" + type.pointDimensionCount()/2 + "). ");
-    }
+    FutureObjects.checkIndex(dimension, type.pointDimensionCount()/2);

Bit since these are exceptional cases I don't think they'd be particularly useful for diagnostics unless you can repeat the problem (and then you can add/debug what you like).

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

One thing that we lose here is the verbosity of exception message (field names, doc ids)

Yes, but there is a reason to change to the new methods! Robert did not add that to the issue description. I directed him on the weekend during a private chat to the new methods (which was caused by my comment in a former issue with the missing check).

To make it short: Those methods are highly optimized by Hotspot. In fact they replaced the bounds checks in ByteBuffers and other places throughout the JDK to call those methods instead of hardcoded if/then/else statements. I also talked with Rémi Forax about it this summer to get more information. He strongly recommends to use the new methods from Objects and Arrays instead of manually crafted if/then/else checks. The reason is: Those methods are intrinsics and are more or less replaced by highly optimized bounds check code. In addition, if you add several levels of bounds checks more magic is happening: a high level method checks bounds and call lower level method, which does bounds checks again (e.g., a ByteBuffer in the JDK), and this lower level method again accesses a Java array (that implcitely does bounds checks, too), - then Hotspot adds all calls of those Objects' index check methods to the internal AST and it can then safely eliminate all of them except one. And if it sees that a variable is unsigned it will also remove negative checks ... and so on. This was done, because they had very hard problems in eliminating those checks everywhere when somebody creates an if statement (there are tons of ways to do the same check with if statements!), the OpenJDK developers argued to use a intrinsic replacement method instead of hardcoded bounds checks with hundreds of variants. By that they only have to optimize a single "if statement" and can replace it by assembly code and remove duplicates.

Even shorter: If you use the Objects methods instead of if statements, you can add more safety to your code, as duplicate checks are eliminated. So we can even start to add checks into BytesRef. Or we can remove the stupid try/catch blocks in ByteBufferIndexInput.

asfimport commented 7 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Uwe: I understand the reason for this patch. My point was just that you do lose some verbosity in the exception message, that's all.

As for performance I wonder what the actual gain will be - I bet you a beer the performance will be within the noise levels on larger tasks. :)

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

As for performance I wonder what the actual gain will be - I bet you a beer the performance will be within the noise levels on larger tasks.

Of course, we have to test this how much it gains! We should compare:

But the main reason is to make our code "safer". We omitted bounds checks at many places, because the overhead was immense (e.g. BytesRef, ByteBufferIndexInput,...) and sometimes not even added asserts. If we make that code safer, it will not be slower in Java 9. So we can say: We still work with Java 8, but you should really use Java 9 for optimal performance.

Robert just started with the important stuff, but there is room for improvements.

asfimport commented 7 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I wholeheartedly agree – it's fascinating stuff and thanks to Robert and you for starting this.

We should compare:

Those bound checks will cost something – the fact they're intrinsic methods doesn't mean they're free. There's another dimension to your list: the CPU architecture hotspot generates code for... I didn't look at the code, but those intrinsics will vary depending on what the CPU has to offer. So regardless of easier ideal graph optimizations (which is great!) there will be some assembly injected to make those bound checks work, especially on CPUs with less complex instruction set.

Like I said: very interesting stuff to work on and benchmark!

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Dawid: i didn't really add new checks in the patch: I converted a bunch of existing checks to the new Objects.checkXXX methods. I think any difference in exception messages is not a big deal, actually i like the consistency and like the newer exception messages (see the unit tests for those: i emulated what java 9 returns in the java 8 methods).

The only place where there are new "checks" is where we use Arrays.compare/equals/mismatch. Those methods are defined in the jdk as safe, of course, and as I mentioned here, the existing comparator/etc methods we have are not very safe. See especially the prefix-coding corner cases I ran into here :) For those methods, the hope is that its still net/net drowned out: and hopefully definitely compensated by the fact that these methods no longer go one-byte-at-a-time.

In all cases none of this stuff should really be a hotspot for lucene. IMO we should be encouraging more safety/checks/reliability, and thats the goal here.

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

There's another dimension to your list: the CPU architecture hotspot generates code for... I didn't look at the code, but those intrinsics will vary depending on what the CPU has to offer.

No there isn't actually, the transformation is platform independent:

http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/b756e7a2ec33/src/share/vm/opto/library_call.cpp#l1160

Please in the future, lets actually look at the code first, before getting too paranoid about such things. Otherwise we make decisions based on FUD instead of facts and end out with the sloppy array code like we have today :(

asfimport commented 7 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I thought it's an intrinsic reaching all the way down to an inlined assembly (macro), not just ideal graph transformation/ optimization. Apologies for not verifying this.

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I opened #9017 for the suggester case failing here, this looks seriously buggy since it has a negative array length, just hidden by the fact BytesRef.compareTo silently returns false today. We should fix it separately.

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Same patch, just folding in the suggester bugfix (#9017) so that all tests pass. I think this is good reason to look at CharsRef/IntsRef/LongsRef comparators and so on, maybe we can find more bugs.

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I updated the patch with more cleanups: nuked the older ArrayUtil methods, cleaned up PrefixCodedTerms, cutover CharsRef,IntsRef,LongsRef.

So this adds compare()/equal() for char[]/int[]/long[]. It also adds mismatch() for char[] to implement the UTF16-in-UTF8-order comparator.

Didnt find any new bugs, so seems like more than enough for now. I will think about how we can do some validation of MRJAR consistency in smoketester/build/something: that's really needed or we can't ensure stuff is correct. And also I will think about Uwe's idea about the bootclasspath hack so maybe folks don't need an actual java9 compiler.

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Updated patch implementing LZ4.commonBytes with mismatch.

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

@jpountz Maybe you can experiment with the LZ4 change if you get bored, since it was your idea :)

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Adrien also mentioned the change to BytesRef.compareTo might impact building the OrdinalMap (#8954), that would be an interesting thing to try to bench as well.

asfimport commented 7 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

I did some tests with the Calgary corpus that can be found at http://corpus.canterbury.ac.nz/descriptions/ (lower is better):

File Time to compress without patch Time to compress with the patch Difference
bib 971702 904173 -6.9%
book1 7479794 7073712 -5.4%
book2 4990347 4574486 -8.3%
geo 1600972 1574435 -1.7%
news 3394833 3222113 -5.1%
obj1 169516 166673 -1.7%
obj2 1869442 1769302 -5.4%
paper1 385900 357472 -7.4%
pic 1528354 1314336 -14%
progc 279295 261445 -6.4%
progl 410565 376898 -8.2%
progp 245654 222230 -9.5%
trans 517571 470134 -9.2%

As expected the improvement is better on files that have long repetitions like source code and the bitmap picture. The speedup is constantly reproducible.

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Adrien, thanks for benchmarking!

asfimport commented 7 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I tested performance on a large OrdinalMap:

java 8 no patch:
  done init top-level VE state; took 149.411s; 1062.73 MB RAM; 94244084 total unique families
  done init top-level VE state; took 148.977s; 1062.73 MB RAM; 94244084 total unique families

java 8 patch:
  done init top-level VE state; took 150.789s; 1062.73 MB RAM; 94244084 total unique families
  done init top-level VE state; took 150.509s; 1062.73 MB RAM; 94244084 total unique families

java 9 no patch:
  done init top-level VE state; took 151.309s; 1062.73 MB RAM; 94244084 total unique families
  done init top-level VE state; took 149.814s; 1062.73 MB RAM; 94244084 total unique families

java 9 patch:
  done init top-level VE state; took 158.535s; 1062.73 MB RAM; 94244084 total unique families
  done init top-level VE state; took 156.207s; 1062.73 MB RAM; 94244084 total unique families

I'm running on 7x and the patch had one non-test conflict which was simple to resolve.

I'm not sure what's going on; does the patch even change code related to OrdinalMap? (I haven't looked closely).

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

does the patch even change code related to OrdinalMap?

it changes BytesRef.compareTo, which normally isnt that interesting, but its the comparison function used when constructing the ordinal map. Previous comments on #8954 suggested that this might be a bottleneck there.

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Mike, is it possible the benchmark didn't warm up here (or maybe something happened with the 7.x backport?). The results are a bit noisy and don't make sense.

When I modify the JMH bench here https://richardstartin.com/2017/07/16/new-methods-in-java-9-math-fma-and-arrays-mismatch/ to use shorter strings right above the threshold where the intrinsic kicks in (8, 9, 10), i don't see any regressions. So I don't think it should ever get slower, regardless ofany wierdness about your data: instead maybe just how the bench was done?

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi, I took Robert's latest patch, committed it to my Github fork and developed it a bit more: I added a stub generator (that can only be used with Java 9) to generate stub classes that only have the Java 9 signatures, but no private stuff or method bodies. This is done by a groovy script, that can be executed (requires Java 9 as JAVA_HOME): ant generate-java9-stubs. The files generated by this are committed, so nobody needs java 9 to compile lucene. The license should be no problem, as no code is involved. This is mentioned in the README.txt in the folder. The stub classes are compiled with class version of Java 8.

When compiling the MR-JAR, the standard Java compiler is used, but we prepend our stub classes to the bootclasspath of javac. This allows us to compile the MR-JAR file also with Java 8. If Java 9 is detected, we don't add anything to bootclasspath.

The branch is here: https://github.com/apache/lucene-solr/compare/master...uschindler:jira/LUCENE-7966

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

This is just my patch on top of Robert's: https://github.com/apache/lucene-solr/commit/e69d77023fbfc60bb0baf16dc5102ab76419cf03

asfimport commented 7 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

> Mike, is it possible the benchmark didn't warm up here (or maybe something happened with the 7.x backport?).

I'm not sure what happened ... the bench should have been "hot": plenty of RAM on the box, and I ran each case twice. I did also run in a virtual env (EC2), i3.16xlarge instance; maybe a noisy neighbor impacted results? I don't think we should let this block committing; the change otherwise seems awesome.

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

@mikemccand: A stupid question: Did you do the benchmark on Java 9 using the JAR file? If you did it with the class-files only classpath, it won't use any Java 9 features, so you won't see any speed improvement. MR-JAR files require to use them as JAR files. Just placing the files in META-INF subdirectories of a file-only classpath won't use them!

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

When thinking last night about the whole thing a bit more, I had a cool idea: Currently we use ASM to generate the stub files to compile against (see my Github repo). On top of these stubs we use a "wrapper class" that just delegates all methods to the Java 9 one. IMHO, this is not nice for the optimizer (although it can handle that). But the oal.future.FutureObjects/FutureArrays classes just contain the same signatures as their Java 9 variants would contain. So my idea is to use ASM to patch all classes:

The good thing:

What do you think? I will try this variant a bit later today. We can use the same approach for other Java 9 classes, too! Maybe this also helps with the issues Mike has seen (I am not happy to have the degelator class).

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Here is the class remapper: https://paste.apache.org/bAzx

Basically it rewrites all references to oal.future.FutureXxxx to the Java 9 type java.util.Xxxx. All files that contain in the Java 8 code in build/classes/java references to our own FutureXxx classes (the remapper sets remapped=true for those) are saved to in rewritten formto a separate directory build/classes/java9 in parallel to the original and are packaged into the multirelease part of the JAR. All classes that have no references to our FutureXxx backports are kept out.

This can be done as a general task and may be applied to all Lucene/Solr modules.

I will update my branch later.

asfimport commented 7 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Michael McCandless: A stupid question: Did you do the benchmark on Java 9 using the JAR file? If you did it with the class-files only classpath, it won't use any Java 9 features, so you won't see any speed improvement. MR-JAR files require to use them as JAR files. Just placing the files in META-INF subdirectories of a file-only classpath won't use them!

That was a great idea @uschindler but alas I was using Lucene via JAR files.

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I implemented by latest idea based again on Robert's patch: https://github.com/apache/lucene-solr/compare/master...uschindler:jira/LUCENE-7966-v2

This approach is much more clean: We compile against Robert's replacement classes FutureObjects and FutureArrays (that have to contain the same method signatures as the Java 9 original, but we can add a test for this later with smoketester) as usual with Java 8. Before packaging the JAR file we read all class files and patch all FutureObjects/FutureArrays references to refer to the Java 9 class. The patched output is sent to a separate folder build/classes/java9. The JAR file is then packaged to take both variants, placing the patched ones in the Java 9 MultiRelease part.

Currently only the lucene-core.jar file uses the patched stuff, so stuff outside lucene-core (e.g., codecs) does not yet automatically add Java 9 variants, instead it will use Robert's classes. If this is the way to go, I will move the patcher to the global tools directory and we can apply patching to all JAR files of the distribution. WARNING: We cannot support Maven builds here, Maven always builds a Java8-only JAR file!

@mikemccand, @jpountz: Could you build a lucene-core.jar file with the above branch on Github and do your tests again? The main difference here is that the JAR file no longer contains a delegator class. Instead all class files that were originally compiled with FutureObjects/FutureArrays (for Java 8 support) are patched to directly use the Java 9 Arrays/Objects methods, without using a delegator class. Keep in mind: This currently only support lucene-core.jar, the codecs JAR file is not yet Multirelease with this patch.

When building with ant jar inside lucene/core you should see output like this:

     [compile shit...................]
     [copy] Copying 3 files to C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\build\core\classes\java

-mrjar-classes-uptodate:

resolve-groovy:
[ivy:cachepath] :: resolving dependencies :: org.codehaus.groovy#groovy-all-caller;working
[ivy:cachepath]         confs: [default]
[ivy:cachepath]         found org.codehaus.groovy#groovy-all;2.4.8 in public
[ivy:cachepath] :: resolution report :: resolve 170ms :: artifacts dl 5ms
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
        ---------------------------------------------------------------------

patch-mrjar-classes:
[ivy:cachepath] :: resolving dependencies :: org.ow2.asm#asm-commons-caller;working
[ivy:cachepath]         confs: [default]
[ivy:cachepath]         found org.ow2.asm#asm-commons;5.1 in public
[ivy:cachepath]         found org.ow2.asm#asm-tree;5.1 in public
[ivy:cachepath]         found org.ow2.asm#asm;5.1 in public
[ivy:cachepath] :: resolution report :: resolve 701ms :: artifacts dl 8ms
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   3   |   0   |   0   |   0   ||   3   |   0   |
        ---------------------------------------------------------------------
   [groovy] Remapped: org/apache/lucene/analysis/tokenattributes/CharTermAttributeImpl
   [groovy] Remapped: org/apache/lucene/codecs/compressing/LZ4
   [groovy] Remapped: org/apache/lucene/document/BinaryPoint$2
   [groovy] Remapped: org/apache/lucene/document/DoubleRange
   [groovy] Remapped: org/apache/lucene/document/FloatRange
   [groovy] Remapped: org/apache/lucene/document/IntRange
   [groovy] Remapped: org/apache/lucene/document/LongRange
   [groovy] Remapped: org/apache/lucene/index/BitsSlice
   [groovy] Remapped: org/apache/lucene/index/CodecReader
   [groovy] Remapped: org/apache/lucene/index/MergeReaderWrapper
   [groovy] Remapped: org/apache/lucene/search/BooleanScorer$TailPriorityQueue
   [groovy] Remapped: org/apache/lucene/util/BytesRef
   [groovy] Remapped: org/apache/lucene/util/BytesRefArray
   [groovy] Remapped: org/apache/lucene/util/CharsRef$UTF16SortedAsUTF8Comparator
   [groovy] Remapped: org/apache/lucene/util/CharsRef
   [groovy] Remapped: org/apache/lucene/util/IntsRef
   [groovy] Remapped: org/apache/lucene/util/LongsRef
   [groovy] Remapped: org/apache/lucene/util/StringHelper
   [groovy] Remapped: org/apache/lucene/util/automaton/Automaton$Builder
   [groovy] Remapped: org/apache/lucene/util/automaton/Automaton
   [groovy] Remapped 20 class files for Java 9 to: C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\build\core\classes\java9
    [touch] Creating C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\build\core\patch-mrjar.stamp

jar-core:
      [jar] Building jar: C:\Users\Uwe Schindler\Projects\lucene\trunk-lusolr1\lucene\build\core\lucene-core-8.0.0-SNAPSHOT.jar

jar:

BUILD SUCCESSFUL
Total time: 31 seconds

This patch also adds uptodate support, so the pathing is only done if the original class files change.

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

WARNING: We cannot support Maven builds here, Maven always builds a Java8-only JAR file!

Why exactly is this the case? with my patch maven should work. Maven just uses the jars produced by ant. The smoketester validates they are exactly the same.

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Why exactly is this the case? with my patch maven should work. Maven just uses the jars produced by ant. The smoketester validates they are exactly the same.

Maven works when you build the JAR files for Maven with our ANT targets. What does not work is the pom.xml build generated by sarowes maven build. That one will build JAR files only with the FutureXxx stuff, but no multirelease stuff. Sorry for being imprecise.

asfimport commented 7 years ago

Robert Muir (@rmuir) (migrated from JIRA)

thanks for the clarification. yes, that's no change from my patch.

asfimport commented 7 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks @uschindler; I'll retest from your branch!

asfimport commented 7 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I retested w/ Uwe's patch:

java 8 w/o uwe's patch
  done init top-level VE state; took 150.849s; 1062.73 MB RAM; 94244084 total unique families
  done init top-level VE state; took 151.488s; 1062.73 MB RAM; 94244084 total unique families

java 8 w/ uwe's patch
  done init top-level VE state; took 156.910s; 1062.73 MB RAM; 94244084 total unique families
  done init top-level VE state; took 154.836s; 1062.73 MB RAM; 94244084 total unique families

java 9 w/o uwe's patch
  done init top-level VE state; took 151.883s; 1062.73 MB RAM; 94244084 total unique families
  done init top-level VE state; took 151.123s; 1062.73 MB RAM; 94244084 total unique families

java 9 w/ uwe's patch
  done init top-level VE state; took 159.383s; 1062.73 MB RAM; 94244084 total unique families
  done init top-level VE state; took 156.662s; 1062.73 MB RAM; 94244084 total unique families

Still not sure what's going on ...

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

No idea either!

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi @mikemccand, I did some more testing and did not figure out a bug in the MR jar that could explain this. I did the following:

I created a simple JAVA class file:

import org.apache.lucene.util.BytesRef;

public class Test {
  public static void main(String... args) {
    BytesRef r1 = new BytesRef(new byte[20], 0, 30);
    BytesRef r2 = new BytesRef(new byte[20], 1, 10);
    r1.compareTo(r2);
  }
}

Of course this code is buggy, but that was the idea here. If I run this with Java 8, I get following output:

$ java -cp lucene-core-8.0.0-SNAPSHOT.jar;. Test
Exception in thread "main" java.lang.IndexOutOfBoundsException: Range [0, 30) out-of-bounds for length 20
        at org.apache.lucene.future.FutureArrays.checkFromToIndex(FutureArrays.java:35)
        at org.apache.lucene.future.FutureArrays.compareUnsigned(FutureArrays.java:62)
        at org.apache.lucene.util.BytesRef.compareTo(BytesRef.java:165)
        at Test.main(Test.java:7)

If I run the same with Java 9, I get the following output:

$ java -cp lucene-core-8.0.0-SNAPSHOT.jar;. Test
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 30
        at java.base/java.util.Arrays.rangeCheck(Arrays.java:122)
        at java.base/java.util.Arrays.compareUnsigned(Arrays.java:6101)
        at org.apache.lucene.util.BytesRef.compareTo(BytesRef.java:165)
        at Test.main(Test.java:7)

As we see, BytesRef class is using Java9's native java.util.Arrays class instead our own FutureArrays.

You can also verify this with javap:

$ javap --multi-release 9 -cp lucene-core-8.0.0-SNAPSHOT.jar -p -c org.apache.lucene.util.BytesRef  | grep Arrays
      34: invokestatic  `#76`                 // Method java/util/Arrays.equals:([BII[BII)Z
      34: invokestatic  `#136`                // Method java/util/Arrays.compareUnsigned:([BII[BII)I
      26: invokestatic  `#143`                // Method java/util/Arrays.copyOfRange:([BII)[B

$ javap --multi-release 8 -cp lucene-core-8.0.0-SNAPSHOT.jar -p -c org.apache.lucene.util.BytesRef  | grep Arrays
      34: invokestatic  `#15`                 // Method org/apache/lucene/future/FutureArrays.equals:([BII[BII)Z
      34: invokestatic  `#29`                 // Method org/apache/lucene/future/FutureArrays.compareUnsigned:([BII[BII)I
      26: invokestatic  `#31`                 // Method java/util/Arrays.copyOfRange:([BII)[B

So the setup is correct.

Can you maybe also check with the above "Test" program that the JAR file you are using behaves correctly?

Nevertheless and unrelated to Mike's problems seing an improvement here, we need to improve the whole thing:

I am still waiting for @jpountz to see his benchmark results with the new MR JAR.

asfimport commented 7 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Sorry Uwe, I was on vacation while you pinged me and did not notice you were asking me something. I'll run the benchmark again today or tomorrow.

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

No problem!

asfimport commented 7 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Here are the results:

File Time to compress without patch Time to compress with the patch Difference
bib 976672 935724 -4.2%
book1 7487574 7300402 -2.5%
book2 4999043 4719148 -5.6%
geo 1598629 1648771 3.1%
news 3398425 3289187 -3.2%
obj1 169515 167649 -1.1%
obj2 1873220 1804144 -3.7%
paper1 386047 364133 -5.7%
paper2 726156 695761 -4.2%
paper3 375937 359946 -4.3%
paper4 111643 108169 -3.1%
paper5 99465 96254 -3.2%
paper6 277771 263810 -5.0%
pic 1528434 1324520 -13.3%
progc 279360 265820 -4.8%
progl 411285 385095 -6.4%
progp 246035 226135 -8.1%
trans 517765 485948 -6.1%

Results are slightly less good than previously but it could well be noise. It's still an improvement compared to master.

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Thanks Adrien!

Results are slightly less good than previously but it could well be noise.

This can only be noise, because the code we are running is the same as Robert's. I just removed the additional indirection in the Java 9 code path by patching the class files directly.

I hope you are still ready: Can you also run this comparison with Java 8? I assume the current numbers are with Java 9. Just to be sure that our "emulation layer" does not slowdown Java 8.

asfimport commented 7 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Sure, let me run that.

asfimport commented 7 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

Now with Java 8

File Time to compress without patch Time to compress with the patch Difference
bib 1033357 1027901 -0.5%
book1 7906551 7961422 +0.7%
book2 5335458 5348388 +0.2%
geo 1629253 1644770 +1.0%
news 3569115 3609734 +1.1%
obj1 176601 178634 +1.2%
obj2 1970078 1993204 +1.2%
paper1 412884 409665 -0.8%
paper2 772086 772052 -0.0%
paper3 401949 397526 -1.1%
paper4 118664 117747 -0.8%
paper5 104995 104855 -0.1%
paper6 296315 295813 -0.2%
pic 1619648 1698345 +4.9%
progc 299270 298341 -0.3%
progl 440063 443714 +0.8%
progp 265607 266388 +0.3%
trans 554452 555335 +0.2%

The slowdown on pic (the most compressible file) is reproducible but other files show very similar performance with and without the patch.

asfimport commented 7 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I keep thinking: is this static replacement of those delegation methods actually visible in performance benchmarks? I'd think once those methods go hot they'd be inlined when their enclosing methods are compiled without a trace of performance degradation? Just wondering.

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

@dweiss: The patched class files are actually easier to maintain, as we do not need Java 9 to compile, no duplicate class files in source folder, or some fake Java 9 signature files (with questionable license) on bootclasspath (see my previous branch). This was the main reason to rewrite the class files instead of maintaining multiple source files. It's just a nice side-effect to no longer need the delegation methods. So I personally like the patching approach much more, as it's well tested by the ASM maintainers (we just use their code, no custom impl). It would be horrible if we'd instead require all committers to have both Java 8 and Java 9 installed!

The question here was just for confirmation and comparison of both approaches, if they have some side effects.

The slowdown on pic (the most compressible file) is reproducible

@jpountz: The one with biggest slowdown on Java 8 is the one with biggest speedup in Java 9. The reason is quite clear: The Java 8 implementation by Robert does more checks than the "old" LZ4 implementation (for safety and to be compatible with new Java 9 impl). But on Java 9 the new method used is an intrinsic, so we have a huge perf win!

asfimport commented 7 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Right, thanks Uwe.

asfimport commented 7 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

I had a closer look at the branch and like the patching approach. Should we modify the smoke tester at the same time to enforce that both Java 8 and 9 are tested?

asfimport commented 7 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I had a closer look at the branch and like the patching approach. Should we modify the smoke tester at the same time to enforce that both Java 8 and 9 are tested?

I think so - that was my plan! In general the testing is automatically done. As soon as you start Lucene Demo or Solr with Java 9 it will test the JAR file. But a separate test might be good (like the Exception test I posted before), to see if stack trace looks as expected.

I will work soon on changing the patching mechanism to be globally (not only in root module). I would also like to remove the @Deprecated from the Future classes (because at this time, they are the only way) and instead add `@lucene.internal`. We should add a separate issue about renoving Future classes, once we swicth to Java 9.

Are there any other tests we should do. I talked with Robert - we both don't understand Mike's findings. I don't trust them unless we have a reproducible benchmark using BytesRefHash and similar. The improvements in LZ4 are phantastic, I would have expected the same from BytesRefHash.

asfimport commented 6 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Sorry for silence on this. I wanted to improve the patch tomorrow.

@mikemccand:

I retested w/ Uwe's patch:

By reading the Java mailing lists, I have an idea what could cause Mike's differences: Are you sure that you used the same garbage collector in Java 8 and Java 9? By default Java 9 uses G1GC, which Java 8 uses the good old ParallelGC by default. On the mailinglist jdk-dev there was a discussion that suddenly computing-intensive stuff was significantly slower with Java 9. The reason was the Garbage Collector! So be sure to use the same one (give it explicit on command line). G1GC adds additional barriers and because of them the number of free CPU registers is lowered by 1. This causes some algorithms to behave worse as missing CPU registers don't allow to do everything in the CPU (@dweiss has seen a sigicficant slowdown for some of his code). More than 10% slowdown is possible! This also affects code that has no garbage collection and barriers because of G1. It's just limited resources causing this. Interestingly the person complaining was talking about LZ4 compression: http://openjdk.5641.n7.nabble.com/Reduced-performance-in-Java-9-0-1-vs-8u152-td322825.html

Could you compare Java 8 and Java 9 with same garbage collector? (a) G1GC on both and (b) ParallelGC and/or CMS ? Uwe