eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 720 forks source link

Scala code compiled differently between OpenJDK11 and Semeru Openj9, causing VerifyError #14054

Open Sidong-Wei opened 2 years ago

Sidong-Wei commented 2 years ago

Java -version output

openjdk version "11.0.13" 2021-10-19 IBM Semeru Runtime Open Edition 11.0.13.0 (build 11.0.13+8) Eclipse OpenJ9 VM 11.0.13.0 (build openj9-0.29.0, JRE 11 Linux s390x-64-Bit Compressed References 20211021_227 (JIT enabled, AOT enabled) OpenJ9 - e1e72c497 OMR - 299b6a2d2 JCL - 2d83aa3b76 based on jdk-11.0.13+8)

Summary of problem

I build Apache Kafka project version 3.0.0 with Semeru JDK 11.0.13 on a s390x machine, and when I started the service I found the following Verify error:

java.lang.VerifyError: JVMVRFY021 thrown object not throwable; class=kafka/log/AbstractIndex, method=<init>(Ljava/io/File;JIZ)V, pc=274
Exception Details:
  Location:
    kafka/log/AbstractIndex.<init>(Ljava/io/File;JIZ)V @274: JBathrow
  Reason:
    Type 'kafka/log/AbstractIndex' (current frame, stack[0]) is not assignable to 'java/lang/Throwable'
  Current Frame:
    bci: @274
    flags: { }
    locals: { 'kafka/log/AbstractIndex', 'java/io/File', long, long_2nd, integer, integer, integer, 'java/io/RandomAccessFile' }
    stack: { 'kafka/log/AbstractIndex' }
  Exception Handler Table:
    bci [85, 249] => handler: 249
  Stackmap Table:
    full_frame(@69,{Object[#1],Object[#60],long,integer,integer,integer},{Object[#1]})
    full_frame(@83,{Object[#1],Object[#60],long,integer,integer,integer},{Object[#1],Object[#144]})
    full_frame(@131,{Object[#1],Object[#60],long,integer,integer,integer,Object[#144]},{Object[#1]})
    same_locals_1_stack_item_frame(@151,Object[#1])
    same_locals_1_stack_item_frame(@184,Object[#1])
    full_frame(@200,{Object[#1],Object[#60],long,integer,integer,integer,Object[#144]},{Object[#1],Object[#154]})
    full_frame(@217,{Object[#1],Object[#60],long,integer,integer,integer,Object[#144],Object[#154]},{Object[#1]})
    same_locals_1_stack_item_frame(@244,Object[#1])
    full_frame(@249,{Object[#1],Object[#60],long,integer,integer,integer,Object[#144]},{Object[#27]})
    full_frame(@275,{Object[#1],Object[#60],long,integer,integer,integer,Object[#144],Object[#154]},{Object[#1],Object[#154]})
    at kafka.log.LazyIndex$.$anonfun$forOffset$1(LazyIndex.scala:106)
    at kafka.log.LazyIndex$$$Lambda$1053/0x0000000000000000.apply(Unknown Source)
    at kafka.log.LazyIndex.$anonfun$get$1(LazyIndex.scala:63)
    at kafka.log.LazyIndex.get(LazyIndex.scala:60)
    at kafka.log.LogSegment.offsetIndex(LogSegment.scala:64)
    at kafka.log.LogSegment.readNextOffset(LogSegment.scala:456)
    at kafka.log.Log.$anonfun$recoverLog$6(Log.scala:921)
    at kafka.log.Log$$Lambda$1055/0x0000000000000000.apply$mcJ$sp(Unknown Source)
    at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
    at scala.Option.getOrElse(Option.scala:201)
    at kafka.log.Log.recoverLog(Log.scala:921)
    at kafka.log.Log.$anonfun$loadSegments$3(Log.scala:801)
    at kafka.log.Log$$Lambda$1050/0x0000000000000000.apply$mcJ$sp(Unknown Source)
    at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.scala:17)
    at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2465)
    at kafka.log.Log.loadSegments(Log.scala:801)
    at kafka.log.Log.<init>(Log.scala:328)
    at kafka.log.Log$.apply(Log.scala:2601)
    at kafka.log.LogManager.$anonfun$getOrCreateLog$1(LogManager.scala:830)
    at kafka.log.LogManager$$Lambda$1022/0x0000000000000000.apply(Unknown Source)
    at scala.Option.getOrElse(Option.scala:201)
    at kafka.log.LogManager.getOrCreateLog(LogManager.scala:783)
    at kafka.cluster.Partition.createLog(Partition.scala:344)
    at kafka.cluster.Partition.createLogIfNotExists(Partition.scala:324)
    at kafka.cluster.Partition.$anonfun$makeLeader$1(Partition.scala:564)
    at kafka.cluster.Partition.makeLeader(Partition.scala:548)
    at kafka.server.ReplicaManager.$anonfun$makeLeaders$5(ReplicaManager.scala:1568)
    at kafka.server.ReplicaManager$$Lambda$1013/0x0000000000000000.apply(Unknown Source)
    at kafka.utils.Implicits$MapExtensionMethods$.$anonfun$forKeyValue$1(Implicits.scala:62)
    at kafka.server.ReplicaManager$$Lambda$1014/0x0000000000000000.apply(Unknown Source)
    at scala.collection.mutable.HashMap$Node.foreachEntry(HashMap.scala:633)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:499)
    at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:1566)
    at kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:1411)
    at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:258)
    at kafka.server.KafkaApis.handle(KafkaApis.scala:171)
    at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:74)
    at java.base/java.lang.Thread.run(Thread.java:836)

The same problem is not observed with Temurin JDK 11.0.13 or with Semeru JDK 11.0.13 on a amd64 machine.

Diagnostic files

As the error message suggested, I examine the compiled Java bytecode on different environment. I found that the issue lies within this source file. The compiled code of this file is different among different JDKs and archs while other files are consistent. Here is part of the difference:

Semeru JDK 11.0.13 on s390x:

  public kafka.log.AbstractIndex(java.io.File, long, int, boolean);
    descriptor: (Ljava/io/File;JIZ)V
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=7, locals=12, args_size=5
         0: aload_0
         1: aload_1
         2: putfield      #104                // Field _file:Ljava/io/File;
...
       240: invokevirtual #566                // Method java/nio/MappedByteBuffer.position:(I)Ljava/nio/Buffer;
       243: pop
       244: aload         8
       246: goto          275
       249: pop
       250: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       253: aload         7
       255: invokedynamic #643,  0            // InvokeDynamic #1:apply$mcV$sp:(Ljava/io/RandomAccessFile;)Lscala/runtime/java8/JFunction0$mcV$sp;
       260: getstatic     #51                 // Field kafka/log/AbstractIndex$.MODULE$:Lkafka/log/AbstractIndex$;
       263: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       266: pop
       267: getstatic     #590                // Field org/slf4j/event/Level.WARN:Lorg/slf4j/event/Level;
       270: invokevirtual #594                // Method kafka/utils/CoreUtils$.swallow:(Lscala/Function0;Lkafka/utils/Logging;Lorg/slf4j/event/Level;)V
       273: aload_0
       274: athrow
       275: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       278: aload         7
       280: invokedynamic #643,  0            // InvokeDynamic #1:apply$mcV$sp:(Ljava/io/RandomAccessFile;)Lscala/runtime/java8/JFunction0$mcV$sp;
       285: getstatic     #51                 // Field kafka/log/AbstractIndex$.MODULE$:Lkafka/log/AbstractIndex$;
       288: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       291: pop
       292: getstatic     #590                // Field org/slf4j/event/Level.WARN:Lorg/slf4j/event/Level;
       295: invokevirtual #594                // Method kafka/utils/CoreUtils$.swallow:(Lscala/Function0;Lkafka/utils/Logging;Lorg/slf4j/event/Level;)V
       298: putfield      #196                // Field mmap:Ljava/nio/MappedByteBuffer;
       301: aload_0
       302: aload_0
       303: invokevirtual #364                // Method mmap:()Ljava/nio/MappedByteBuffer;
       306: invokevirtual #538                // Method java/nio/MappedByteBuffer.limit:()I
       309: aload_0
       310: invokevirtual #190                // Method entrySize:()I
       313: idiv
       314: putfield      #207                // Field _maxEntries:I
       317: aload_0
       318: aload_0
       319: invokevirtual #364                // Method mmap:()Ljava/nio/MappedByteBuffer;
...
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
          202      44     8   idx   Ljava/nio/MappedByteBuffer;
           47     251     6 newlyCreated   Z
           85     213     7   raf   Ljava/io/RandomAccessFile;
            0     334     0  this   Lkafka/log/AbstractIndex;
            0     334     1 _file   Ljava/io/File;
            0     334     2 baseOffset   J
            0     334     4 maxIndexSize   I
            0     334     5 writable   Z
          139       8     9 roundDownToExactMultiple_factor   I
          232       8    10 roundDownToExactMultiple_number   I
          232       8    11 roundDownToExactMultiple_factor   I
    MethodParameters:
      Name                           Flags
      _file                          final
      baseOffset                     final
      maxIndexSize                   final
      writable                       final

Attaching the full text here: semeru_disassembled.txt

Temurin JDK 11.0.13 on s390x (Semeru JDK 11.0.13 on amd64 is similar):

  public kafka.log.AbstractIndex(java.io.File, long, int, boolean);
    descriptor: (Ljava/io/File;JIZ)V
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=7, locals=13, args_size=5
         0: aload_0
         1: aload_1
...
       240: invokevirtual #566                // Method java/nio/MappedByteBuffer.position:(I)Ljava/nio/Buffer;
       243: pop
       244: aload         8
       246: goto          277
       249: astore        9
       251: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       254: aload         7
       256: invokedynamic #643,  0            // InvokeDynamic #1:apply$mcV$sp:(Ljava/io/RandomAccessFile;)Lscala/runtime/java8/JFunction0$mcV$sp;
       261: getstatic     #51                 // Field kafka/log/AbstractIndex$.MODULE$:Lkafka/log/AbstractIndex$;
       264: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       267: pop
       268: getstatic     #590                // Field org/slf4j/event/Level.WARN:Lorg/slf4j/event/Level;
       271: invokevirtual #594                // Method kafka/utils/CoreUtils$.swallow:(Lscala/Function0;Lkafka/utils/Logging;Lorg/slf4j/event/Level;)V
       274: aload         9
       276: athrow
       277: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       280: aload         7
       282: invokedynamic #643,  0            // InvokeDynamic #1:apply$mcV$sp:(Ljava/io/RandomAccessFile;)Lscala/runtime/java8/JFunction0$mcV$sp;
       287: getstatic     #51                 // Field kafka/log/AbstractIndex$.MODULE$:Lkafka/log/AbstractIndex$;
       290: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       293: pop
       294: getstatic     #590                // Field org/slf4j/event/Level.WARN:Lorg/slf4j/event/Level;
       297: invokevirtual #594                // Method kafka/utils/CoreUtils$.swallow:(Lscala/Function0;Lkafka/utils/Logging;Lorg/slf4j/event/Level;)V
       300: putfield      #196                // Field mmap:Ljava/nio/MappedByteBuffer;
       303: aload_0
       304: aload_0
       305: invokevirtual #364                // Method mmap:()Ljava/nio/MappedByteBuffer;
       308: invokevirtual #538                // Method java/nio/MappedByteBuffer.limit:()I
       311: aload_0
       312: invokevirtual #190                // Method entrySize:()I
       315: idiv
       316: putfield      #207                // Field _maxEntries:I
       319: aload_0
       320: aload_0
       321: invokevirtual #364                // Method mmap:()Ljava/nio/MappedByteBuffer;
...
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
          202      44     8   idx   Ljava/nio/MappedByteBuffer;
           47     253     6 newlyCreated   Z
           85     215     7   raf   Ljava/io/RandomAccessFile;
            0     336     0  this   Lkafka/log/AbstractIndex;
            0     336     1 _file   Ljava/io/File;
            0     336     2 baseOffset   J
            0     336     4 maxIndexSize   I
            0     336     5 writable   Z
          139       8    10 roundDownToExactMultiple_factor   I
          232       8    11 roundDownToExactMultiple_number   I
          232       8    12 roundDownToExactMultiple_factor   I
    MethodParameters:
      Name                           Flags
      _file                          final
      baseOffset                     final
      maxIndexSize                   final
      writable                       final

Attaching the full text here: temurin_disassembled.txt

From the comparison, we could tell that the local variable no. 9 on Temurin machine is missing on Semeru machine. There are two operations involving this variable 249: astore 9 and 274: aload 9, whereas the latter one was mistakenly compiled to 273: aload_0 on Semeru s390x machine, which leading to the next operation 274: athrow to throw the class object itself (while it actually is supposed to throw local variable 9). This assumption matches what the error message is saying (kafka/log/AbstractIndex is not throwable as it was not supposed to be thrown).

I compared the source code with the compiled java bytecode, and I believe the snippet above is compiled from the following source file block (line 109 - 138 from file):

  protected var mmap: MappedByteBuffer = {
    val newlyCreated = file.createNewFile()
    val raf = if (writable) new RandomAccessFile(file, "rw") else new RandomAccessFile(file, "r")
    try {
      /* pre-allocate the file if necessary */
      if(newlyCreated) {
        if(maxIndexSize < entrySize)
          throw new IllegalArgumentException("Invalid max index size: " + maxIndexSize)
        raf.setLength(roundDownToExactMultiple(maxIndexSize, entrySize))
      }

      /* memory-map the file */
      _length = raf.length()
      val idx = {
        if (writable)
          raf.getChannel.map(FileChannel.MapMode.READ_WRITE, 0, _length)
        else
          raf.getChannel.map(FileChannel.MapMode.READ_ONLY, 0, _length)
      }
      /* set the position in the index for the next entry */
      if(newlyCreated)
        idx.position(0)
      else
        // if this is a pre-existing index, assume it is valid and set position to last entry
        idx.position(roundDownToExactMultiple(idx.limit(), entrySize))
      idx
    } finally {
      CoreUtils.swallow(raf.close(), AbstractIndex)
    }
  }

Apparently, there are some bugs when compiling the code block above with Semeru on s390x, while the bugs are not present when compile on AMD64 or using Temurin JDK. The most obvious sign of the bug is the missing 9 local variable, which unfortunately is not shown on the local variable table as well. I am guessing the 9 variable may relate to the try block as it is supposed to be thrown at a certain point, but I cannot get further information on that, and I also could not get a simpler reproduction test case as the issue seems to be pretty subtle and tricky.

Another thing that might help with debugging this issue is the scala compilation options. In the environment above, I have been using Kafka's default scala compilation flags, which is:

2021-12-01T14:13:43.183+0000 [DEBUG] [org.gradle.api.internal.tasks.scala.ZincScalaCompilerFactory] [zinc] The Scala compiler is invoked with:
    -deprecation
    -unchecked
    -deprecation
    -unchecked
    -encoding
    utf8
    -Xlog-reflective-calls
    -feature
    -language:postfixOps
    -language:implicitConversions
    -language:existentials
    -Xlint:constant
    -Xlint:delayedinit-select
    -Xlint:doc-detached
    -Xlint:missing-interpolator
    -Xlint:nullary-unit
    -Xlint:option-implicit
    -Xlint:package-object-classes
    -Xlint:poly-implicit-overload
    -Xlint:private-shadow
    -Xlint:stars-align
    -Xlint:type-parameter-shadow
    -Xlint:unused
    -opt:l:inline
    -opt-inline-from:org.apache.kafka.**
    -opt-inline-from:kafka.**
    -opt-warnings
    -Xlint:strict-unsealed-patmat
    -Xfatal-warnings
    -release
    8
    -bootclasspath
    /home/sidong/.gradle/caches/modules-2/files-2.1/org.scala-lang/scala-library/2.13.6/ed7a2f528c7389ea65746c22a01031613d98ab3d/scala-library-2.13.6.jar
    -classpath
    /home/sidong/src/kafka/server-common/build/classes/java/main:/home/sidong/src/kafka/clients/build/classes/java/main:/home/sidong/.gradle/caches/modules-2/files-2.1/com.fasterxml.jackson.dataformat/jackson-dataformat-csv/2.12.3/eeaa685648b9eec8e31e74e076bc822d719ff63f/jackson-dataformat-csv-2.12.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.fasterxml.jackson.datatype/jackson-datatype-jdk8/2.12.3/77424ea087313312e308dae5ff8445608aabb5e1/jackson-datatype-jdk8-2.12.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.fasterxml.jackson.core/jackson-databind/2.12.3/d6153f8fc60c479ab0f9efb35c034526436a4953/jackson-databind-2.12.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.fasterxml.jackson.core/jackson-annotations/2.12.3/7275513412694a1aafd08c0287f48469fa0e6e17/jackson-annotations-2.12.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.fasterxml.jackson.core/jackson-core/2.12.3/deb23fe2a7f2b773e18ced2b50d4acc1df8fa366/jackson-core-2.12.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.fasterxml.jackson.module/jackson-module-scala_2.13/2.12.3/59f934e4fa34ad07afae4e80e9057c89bea9b52c/jackson-module-scala_2.13-2.12.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.scala-lang.modules/scala-collection-compat_2.13/2.4.4/9cb262981ba1fac1f25c0e7e2b285d536ec8923b/scala-collection-compat_2.13-2.4.4.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.scala-lang.modules/scala-java8-compat_2.13/1.0.0/8ffac68615b438933fe27506e755d790a68b8bab/scala-java8-compat_2.13-1.0.0.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.typesafe.scala-logging/scala-logging_2.13/3.9.3/d97c3bcdcf6179e110af8ad2a64ca276843395c1/scala-logging_2.13-3.9.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.scala-lang/scala-reflect/2.13.6/bb523e56c63746a5752dece28fcd702c54fd3a11/scala-reflect-2.13.6.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.scala-lang/scala-library/2.13.6/ed7a2f528c7389ea65746c22a01031613d98ab3d/scala-library-2.13.6.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/log4j/log4j/1.2.17/5af35056b4d257e4b64b9e8069c0746e8b08629f/log4j-1.2.17.jar:/home/sidong/src/kafka/metadata/build/classes/java/main:/home/sidong/src/kafka/raft/build/classes/java/main:/home/sidong/src/kafka/storage/build/classes/java/main:/home/sidong/.gradle/caches/modules-2/files-2.1/net.sourceforge.argparse4j/argparse4j/0.7.0/6f0621d0c3888de39e0f06d01f37ba53a798e657/argparse4j-0.7.0.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/net.sf.jopt-simple/jopt-simple/5.0.4/4fdac2fbe92dfad86aa6e9301736f6b4342a3f5c/jopt-simple-5.0.4.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.yammer.metrics/metrics-core/2.2.0/f82c035cfa786d3cbec362c38c22a5f5b1bc8724/metrics-core-2.2.0.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.dropwizard.metrics/metrics-core/4.1.12.1/cb2f351bf4463751201f43bb99865235d5ba07ca/metrics-core-4.1.12.1.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.apache.zookeeper/zookeeper/3.6.3/a6e74f826db85ff8c51c15ef0fa2ea0b462aef25/zookeeper-3.6.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.slf4j/slf4j-api/1.7.30/b5a4b6d16ab13e34a88fae84c35cd5d68cac922c/slf4j-api-1.7.30.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/commons-cli/commons-cli/1.4/c51c00206bb913cd8612b24abd9fa98ae89719b1/commons-cli-1.4.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/com.thoughtworks.paranamer/paranamer/2.8/619eba74c19ccf1da8ebec97a2d7f8ba05773dd6/paranamer-2.8.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.apache.zookeeper/zookeeper-jute/3.6.3/8990d19ec3db01f45f82d4011a11b085db66de05/zookeeper-jute-3.6.3.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/org.apache.yetus/audience-annotations/0.5.0/55762d3191a8d6610ef46d11e8cb70c7667342a3/audience-annotations-0.5.0.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-handler/4.1.62.Final/78bdd07b607bf1e3f7d74702fd8e56fd4dceb9a6/netty-handler-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-transport-native-epoll/4.1.62.Final/d5ad8ffa470d4579da6939f452ec2c85708974f2/netty-transport-native-epoll-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-codec/4.1.62.Final/922a898d93002486fb20a82ac4fbca1e42b87ee0/netty-codec-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-transport-native-unix-common/4.1.62.Final/d6bca9220b0594a3ad85ddc07e4aadf6562e78a2/netty-transport-native-unix-common-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-transport/4.1.62.Final/e1c0d805e9b28dd6c4075eb8c08be650314f317/netty-transport-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-resolver/4.1.62.Final/6f5ed2b21695e3d4c678e250799afe8e6a1dbe1b/netty-resolver-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-buffer/4.1.62.Final/6b030204124386f3b96e563b173b3239d633356c/netty-buffer-4.1.62.Final.jar:/home/sidong/.gradle/caches/modules-2/files-2.1/io.netty/netty-common/4.1.62.Final/56d494a54633ee63d49248a78d2a6dc83d0c4e0e/netty-common-4.1.62.Final.jar:/home/sidong/src/kafka/core/build/classes/java/main:/home/sidong/src/kafka/core/build/classes/scala/main

And I actually found that if I turn off the inliner optimization of Scala compiler, it will leads to different Java machine code but the issue will not persist on s390x with Semeru JDK. Please see this attached file:semeru_no_inliner_disassembled.txt As mentioned, turning off inliner will lead to a different bytecode file but for this specific class kafka/log/AbstractIndex, the local variable 9 and the two corresponding operation will be correctly compiled, thus the VerifyError will not emerge. So it seems the bug only happens on s390x and while the inliner is turned on.

It will be really appreciated if someone could look into this error and find out why one local variable will be missing when compiling scala code with Semeru JDK11. Thank you.

pshipton commented 2 years ago

@tajila

tajila commented 2 years ago

@ChengJin01 Please take a look at this

ChengJin01 commented 2 years ago

Just talked to @Sidong-Wei offline. The reason why the VerifyError was captured during runtime verification (which is correct as AbstractIndex is not throwable) is that the Scala scource was already mistakenly compiled to the wrong bytecode on s390x by the Gradle compiler, which is out of VM scope given the bytecode is statically generated in the .jar file before running the bytecode. @Sidong-Wei might need to investigate why the bytecode wasn't compiled correctly as expected on s390x. e.g. whether there is any connection to the java compiler (javac) or something else specific to s390x.

Sidong-Wei commented 2 years ago

Hi @ChengJin01 thanks a lot for your input, I think you made a good point in claiming the issue has nothing to do with the Verifier or JVM as the issue happens bytecode was compiled statically. I further looked into how the Kafka project use java compiler and scala compiler to compile its source code. I found that it is using the gradle scala plugin, which use the zinc compiler to compile its hybrid java and scala code. I checked the source code of the compiler and tested with different JDKs, and I did not find any evidence that it has platform specific code or jvm specific code, which leads to the assumption that the issue I experienced may be still within the JDK scope (based on different bytecode on same machine with different JDK). I understand the difference between Openj9 and Hotspot is mainly on the JVM side, but I believe there is still difference between these variants in terms of libraries and javac implementation. (i.e. I remember the ThreadMXBean.getThreadAllocatedBytes method from com.sun.management package is implemented in Hotspot but not in Openj9), so I wonder if there is someone in the openj9 community who coule help with this type of issue (with javac or shared library)? Since the issue is both platform-relevant and JDK-relevant, I suspect the issue may be with the specific javac implementation on s390x in openj9 repo, and any help to further narrow the issue down would be greatly appreciated, thanks.

ChengJin01 commented 2 years ago

@Sidong-Wei, the java compiler (javac) is implemented in OpenJDK which doesn't belong to OpenJ9.

The question is whether the zinc compiler depends on javac to do compilation. Even so, I still doubt the issue was triggered by javac. So it is likely there is intermediate source code generated by zinc prior to the bytecode, which is different across platforms.

ChengJin01 commented 2 years ago

I double-checked this project at https://github.com/apache/kafka/tree/3.0 and noticed that the kafka/core/src/main/scala/kafka/log/AbstractIndex.scala was compiled to kafka/core/build/classes/scala/main/kafka/log/AbstractIndex.class with the command ./gradlew core:compileScala wich launched at least 3 java processes during the building progress as follows:

[1] The background GradleDaemon: jdk11_openj9_semeru/bin/java -Xss4m -XX:+UseParallelGC --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.lang.invoke=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.prefs/java.util.prefs=ALL-UNNAMED --add-opens java.base/java.nio.charset=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED -Xmx2g -Dfile.encoding=UTF-8 -Duser.country=US -Duser.language=en -Duser.variant -cp /home/jenkins/.gradle/wrapper/dists/gradle-7.1.1-all/1wqbeia0d49252rmlcr6o8lbl/gradle-7.1.1/lib/gradle-launcher-7.1.1.jar org.gradle.launcher.daemon.bootstrap.GradleDaemon 7.1.1

[2] The GradleWrapperMain (command line parser): jdk11_openj9_semeru/bin/java -Xint -Xmx64m -Xms64m -Dorg.gradle.appname=gradlew -classpath /home/jenkins/jchau_ffi/14054_apache_kafka_jdk11/kafka/gradle/wrapper/gradle-wrapper.jar org.gradle.wrapper.GradleWrapperMain core:compileScala

[3] The Gradle Worker Daemon: e.g. jdk11_openj9_semeru/bin/java -Xss4m -XX:+UseParallelGC @/home/jenkins/.gradle/.tmp/gradle-worker-classpath2336945854620526368txt -Xmx2048m -Dfile.encoding=UTF-8 -Duser.country=US -Duser.language=en -Duser.variant worker.org.gradle.process.internal.worker.GradleWorkerMain 'Gradle Worker Daemon 1'

By modifying kafka/build.gradle to disable JIT on The Gradle Worker Daemon as follows:

  defaultJvmArgs = ["-Xint", "-Xss4m", "-XX:+UseParallelGC"]

the issue disappeared (tried many times) and ended up with the correct bytecode:

  public kafka.log.AbstractIndex(java.io.File, long, int, boolean);
    descriptor: (Ljava/io/File;JIZ)V
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=7, locals=13, args_size=5
...
       244: aload         8
       246: goto          277
       249: astore        9  <----- push the thrown vlaue to the local variable #9
       251: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       254: aload         7
       256: invokedynamic #643,  0            // InvokeDynamic #1:apply$mcV$sp:(Ljava/io/RandomAccessFile;)Lscala/runtime/java8/JFunction0$mcV$sp;
       261: getstatic     #51                 // Field kafka/log/AbstractIndex$.MODULE$:Lkafka/log/AbstractIndex$;
       264: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
       267: pop
       268: getstatic     #590                // Field org/slf4j/event/Level.WARN:Lorg/slf4j/event/Level;
       271: invokevirtual #594                // Method kafka/utils/CoreUtils$.swallow:(Lscala/Function0;Lkafka/utils/Logging;Lorg/slf4j/event/Level;)V
       274: aload         9  <----- load the thrown vlaue from the local variable #9
       276: athrow
...
      Exception table:
         from    to  target type
            85   249   249   any

My understanding is that the scala source must be loaded/compiled directly to bytecode in memory and was most likely optimized in there by JIT before being outputted into the class file but still unclear where the optimization occurred.

So we need to get the JIT team involved in the investigation to see how & why this happened in this case.

FYI: @0xdaryl

DanHeidinga commented 2 years ago

@0xdaryl can you update this for the 0.30 release?

0xdaryl commented 2 years ago

JIT hasn't had a chance to look at the failure yet. Thanks @Sidong-Wei for your thorough investigation upon opening the PR. Given the apparent Z-specific nature of the problem, I'll ask @joransiu to assign for JIT investigation based on @ChengJin01's helpful analysis so far.

This seems unlikely to be resolved for 0.30.

pshipton commented 2 years ago

I'll move it out.

jonathan-albrecht-ibm commented 2 years ago

I'm wondering if I could get an update on where this issue stands? Is there a plan to implement it in a particular release? I retested this against the current kafka 3.2.0 source release and this issue still occurs. I tested with the IBM Semeru Runtime Certified Edition 11.0.15.0 (build 11.0.15+10)

pshipton commented 2 years ago

@joransiu it seems the question is for you.

joransiu commented 2 years ago

Apologies for the delay. I missed the original tag, so unfortuantely, no updates to report. Will have someone look into this from JIT perspective.

VermaSh commented 2 years ago

Hey, @Sidong-Wei I am going to be looking in to this failure. What commands did you use to setup and run? Is this it?

curl -kn0OL https://archive.apache.org/dist/kafka/3.0.0/kafka-3.0.0-src.tgz
tar -xf kafka-3.0.0-src.tgz
cd kafka-3.0.0-src/

bin/zookeeper-server-start.sh config/zookeeper.properties

# In a separate console
bin/kafka-server-start.sh config/server.properties
jonathan-albrecht-ibm commented 2 years ago

Hi @VermaSh I can help you with this issue. You can reproduce it from the unit tests which may be easier.

It's actually easier to setup with kafka 3.2.0 now which has better support for s390x in the jar dependencies.

mkdir src
curl -kn0OL https://archive.apache.org/dist/kafka/3.2.0/kafka-3.2.0-src.tgz
cd src/
tar xf ../kafka-3.2.0-src.tgz
cd kafka-3.2.0-src/
# I'm using IBM Semeru Runtime Certified Edition 11.0.15.0 (build 11.0.15+10)
export JAVA_HOME=/opt/jdk-11.0.15+10
./gradlew jar
# At this point you should be able to run through the quickstart guide but I haven't tried it.
# Several of the tests in the core module will fail. This is one of them:
./gradlew :core:test --tests FetchRequestBetweenDifferentIbpTest.testControllerOldToNewIBP

you should see an error that starts with:

FetchRequestBetweenDifferentIbpTest > testControllerOldToNewIBP() FAILED                                                    java.lang.VerifyError: JVMVRFY021 thrown object not throwable; class=kafka/log/AbstractIndex, method=<init>(Ljava/io/File;JIZ)V, pc=274                                                                                                         Exception Details:                                                                                                        Location:
        kafka/log/AbstractIndex.<init>(Ljava/io/File;JIZ)V @274: JBathrow
      Reason:
        Type 'kafka/log/AbstractIndex' (current frame, stack[0]) is not assignable to 'java/lang/Throwable'
      Current Frame:
        bci: @274
        flags: { }
        locals: { 'kafka/log/AbstractIndex', 'java/io/File', long, long_2nd, integer, integer, integer, 'java/io/RandomAccessFile' }
        stack: { 'kafka/log/AbstractIndex' }
      Exception Handler Table:
        bci [85, 249] => handler: 249
      Stackmap Table:
...

Let me know if I can help

jonathan-albrecht-ibm commented 1 year ago

Hi @VermaSh, just wondering if there is any update on this issue?

VermaSh commented 1 year ago

Hi @jonathan-albrecht-ibm, Sorry I got swamped with some other work so don't have an update right now. But I'll try to have an update over next few days.

VermaSh commented 1 year ago

Quick update: Disabling asynchronous JIT compilation for gradle worker daemon fixes the failures. Doing a binary search now to get the failing JIT method.

jonathan-albrecht-ibm commented 1 year ago

Hi @VermaSh, just wanted to check if you were able to find the failing JIT method.

VermaSh commented 1 year ago

Hey @jonathan-albrecht-ibm It fails when scala/tools/asm/tree/analysis/Analyzer.analyze(Ljava/lang/String;Lscala/tools/asm/tree/MethodNode;)[Lscala/tools/asm/tree/analysis/Frame; is compiled at opt level hot. Limiting the method to opt level warm or below results in correct byte being generated for AbstractIndex : -Xjit:'{scala/tools/asm/tree/analysis/Analyzer.analyze(Ljava/lang/String*}(traceFull,optLevel=warm,log=trace.log)'

I am going through the method log now, so will have an other update for you soon.

VermaSh commented 1 year ago

I was able to narrow it down further to class hierarchy table. Running with disableCHOpts fixes the failures. Next step here is to figure out what exactly about ch opts causes the failure.

jonathan-albrecht-ibm commented 1 year ago

Hi @VermaSh, just wanted to check in and see if you had any new info on this issue.

VermaSh commented 1 year ago

Hi @jonathan-albrecht-ibm, unfortunately haven't had a chance to investigate further into the failure. I am hoping to have some free bandwidth later this week as things are cooling down for the other project.

VermaSh commented 1 year ago

Haven't have able to make much progress here as I had to investigate a blocker on zLinux. I'll post an update as soon as I make progress for this failure.

jonathan-albrecht-ibm commented 1 year ago

Hi @VermaSh, just wanted to check in and see if you had any new info on this issue.

jonathan-albrecht-ibm commented 7 months ago

Hi @VermaSh, just wanted to check in again on this issue

VermaSh commented 7 months ago

@jonathan-albrecht-ibm sorry, haven't had the bandwidth to look into this.

r30shah commented 7 months ago

@Spencer-Comin Can you take a look at the issue given that is is seen with kafka ? For some reason, I can not assign this one to you.

Spencer-Comin commented 7 months ago

RE: https://github.com/eclipse-openj9/openj9/issues/14054#issuecomment-1170200239 I was able to reproduce this error with kafka 3.2.0 and semeru 11.0.21+9. As a side note, with the latest kafka (3.6.1) I do not see the error.

jonathan-albrecht-ibm commented 6 months ago

Hi @Spencer-Comin, just wanted to check if you are planning on working on this issue. It's currently unassigned.

Spencer-Comin commented 6 months ago

@jonathan-albrecht-ibm I’m off for the next two weeks; once I’m back I’ll continue working on this issue.

Spencer-Comin commented 3 months ago

I've poked around at this for the last few weeks. The issue disappears when I use Semeru 11.0.22+7. I've confirmed that the correct bytecode is produced.

      249: astore        9
      251: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
      254: aload         7
      256: invokedynamic #643,  0            // InvokeDynamic #1:apply$mcV$sp:(Ljava/io/RandomAccessFile;)Lscala/runtime/java8/JFunction0$mcV$sp;
      261: getstatic     #51                 // Field kafka/log/AbstractIndex$.MODULE$:Lkafka/log/AbstractIndex$;
      264: getstatic     #238                // Field kafka/utils/CoreUtils$.MODULE$:Lkafka/utils/CoreUtils$;
      267: pop
      268: getstatic     #590                // Field org/slf4j/event/Level.WARN:Lorg/slf4j/event/Level;
      271: invokevirtual #594                // Method kafka/utils/CoreUtils$.swallow:(Lscala/Function0;Lkafka/utils/Logging;Lorg/slf4j/event/Level;)V
      274: aload         9
      276: athrow

Full disassembly here: AbstractIndex.11.0.22+7.disassembly.txt

jonathan-albrecht-ibm commented 3 months ago

Thanks @Spencer-Comin that looks good to me. I'm okay to close this issue now.