eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 720 forks source link

OpenJDK method handles, crash trying to start testing on win32 #18751

Open pshipton opened 9 months ago

pshipton commented 9 months ago

Internal build - head stream with OpenJDK method handles enabled win22x86-svl-rt4-1

Diagnostics were created but not archived.

19:13:32  + cd ./aqa-tests/TKG
19:13:32  + make genParallelList TEST=sanity.functional TRSS_URL=http://trssrtp1.fyre.ibm.com/ NUM_MACHINES=2 TEST_TIME=
19:13:33  JAVA_HOME is set to C:/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/jdkbinary/j2sdk-image
19:13:33  /bin/sh: -f1: command not found
19:13:33  uname: write error: Broken pipe
19:13:33  rm -f -r /cygdrive/c/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/aqa-tests/TKG/../TKG/output_compilation; \
19:13:33  mkdir -p /cygdrive/c/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/aqa-tests/TKG/../TKG/output_compilation; \
19:13:33  (ant -f ./scripts/build_tools.xml -DTEST_JDK_HOME=C:/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/jdkbinary/j2sdk-image -DTEST_ROOT=/cygdrive/c/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/aqa-tests/TKG/.. -DLIB_DIR=C:/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/../../testDependency/lib 2>&1; echo $? ) | tee "/cygdrive/c/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/aqa-tests/TKG/../TKG/output_compilation/compilation.log"; \
19:13:33  if [ -z "$(tail -1 /cygdrive/c/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/aqa-tests/TKG/../TKG/output_compilation/compilation.log | grep 0)" ]; then perl scripts/moveDmp.pl --compileLogPath="/cygdrive/c/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/aqa-tests/TKG/../TKG/output_compilation/compilation.log" --testRoot="/cygdrive/c/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/aqa-tests/TKG/.."; false; else rm -f -r "/cygdrive/c/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/aqa-tests/TKG/../TKG/output_compilation"; fi
19:13:37  Buildfile: C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\aqa-tests\TKG\scripts\build_tools.xml
19:13:38  Unhandled exception
19:13:38  Type=Segmentation error vmState=0x00000000
19:13:38  J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=01E0B628 ContextFlags=0001007f
19:13:38  Handler1=7196D140 Handler2=718A54D0 DEPAddress=01E0B628
19:13:38  EDI=22DE088C ESI=21B83C88 EAX=21BC62B8 EBX=01E0B628
19:13:38  ECX=01E01148 EDX=0196F670
19:13:38  EIP=01E0B628 ESP=22F60710 EBP=22790C00 EFLAGS=00210206
19:13:38  GS=002B FS=0053 ES=002B DS=002B
19:13:38  Module=
19:13:38  Module_base_address=01D10000 Offset_in_DLL=000fb628
19:13:38  Target=2_90_20240111_3 (Windows Server 2022 10.0 build 20348)
19:13:38  CPU=x86 (4 logical CPUs) (0x1fff78000 RAM)
19:13:38  ----------- Stack Backtrace -----------
19:13:38  (0x01E0B628)
19:13:38  (0x01E01148)
19:13:38  (0x21B166D0)
19:13:38  (0x21B166D0)
19:13:38  (0x28572C84)
19:13:38  ---------------------------------------
19:13:38  JVMDUMP039I Processing dump event "gpf", detail "" at 2024/01/11 16:13:38 - please wait.
19:13:38  JVMDUMP032I JVM requested System dump using 'C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\aqa-tests\TKG\core.20240111.161338.736.0001.dmp' in response to an event
19:13:40  JVMDUMP010I System dump written to C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\aqa-tests\TKG\core.20240111.161338.736.0001.dmp
19:13:40  JVMDUMP032I JVM requested Java dump using 'C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\aqa-tests\TKG\javacore.20240111.161338.736.0002.txt' in response to an event
19:13:41  JVMDUMP010I Java dump written to C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\aqa-tests\TKG\javacore.20240111.161338.736.0002.txt
19:13:41  JVMDUMP032I JVM requested Snap dump using 'C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\aqa-tests\TKG\Snap.20240111.161338.736.0003.trc' in response to an event
19:13:41  JVMDUMP010I Snap dump written to C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\aqa-tests\TKG\Snap.20240111.161338.736.0003.trc
19:13:41  JVMDUMP032I JVM requested JIT dump using 'C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\aqa-tests\TKG\jitdump.20240111.161338.736.0004.dmp' in response to an event
19:13:41  JVMDUMP051I JIT dump occurred in 'main' thread 0x22790C00
19:13:41  JVMDUMP010I JIT dump written to C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\aqa-tests\TKG\jitdump.20240111.161338.736.0004.dmp
19:13:41  JVMDUMP013I Processed dump event "gpf", detail "".
19:13:41  127
19:13:43  20240111-16:13:42 - Successfully moved core file core.20240111.161338.736.0001.dmp to C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\aqa-tests\TKG\output_compilation
19:13:43  Missing braces on \o{} in regex; marked by <-- HERE in m/C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\aqa-tests\TKG\o <-- HERE utput_compilation
19:13:43  / at scripts/moveDmp.pl line 155, <$Log> line 1.
19:13:43  make: *** [makefile:107: compileTools] Error 1
pshipton commented 9 months ago

@babsingh @ThanHenderson

ThanHenderson commented 8 months ago

This failure occurs during the following command:

  C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\jdkbinary\j2sdk-image\jre\bin\java.exe
  -classpath C:/apache-ant/apache-ant-1.10.5/lib/ant-launcher.jar
  -Dant.home=C:/apache-ant/apache-ant-1.10.5
  -Dant.library.dir=C:/apache-ant/apache-ant-1.10.5/lib
  -Dcygwin.user.home=C:/cygwin64/home/jenkins org.apache.tools.ant.launch.Launcher
  -cp "" -f ./scripts/build_tools.xml
  -DTEST_JDK_HOME=C:/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/jdkbinary/j2sdk-image
  -DTEST_ROOT=/cygdrive/c/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/aqa-tests/TKG/..
  -DLIB_DIR=C:/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/../../testDependency/lib

Using jdmpview we see the exception is from here:

> !snapformat -t 0x221c0f00
              ⋮
0x221C0F00 j9vm.682             Entry      >WalkStackFrames - walkThread=0x221c0f00 flags=0x2c0000 sp=0x228f19f4 a0=0x228f1a08 pc=0x3 literals=0x0 els=0x13cfd4c j2i=0x0
0x221C0F00 j9vm.683             Exit       <WalkStackFrames - walkThread=0x221c0f00, rc=0
0x221C0F00 j9shr.2334           Exception  * CC changePartialPageProtection: Returning without changing page protection for address 0x25201540 to read-write
0x221C0F00 j9shr.2334           Exception  * CC changePartialPageProtection: Returning without changing page protection for address 0x24829ee8 to read-write
0x221C0F00 omrport.256          Exception  * omrfile_open returns failure, filename = C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\jdkbinary\j2sdk-image\jre\bin\default\java_en_US.properties, os errno = 2, portable errno = -108
0x221C0F00 omrport.256          Exception  * omrfile_open returns failure, filename = C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\jdkbinary\j2sdk-image\jre\bin\default\java_en.properties, os errno = 2, portable errno = -108
0x221C0F00 j9dmp.9              Event       Preparing for dump, filename=C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\aqa-tests\TKG\core.20240116.071021.2576.0001.dmp

However, this does not occur when either -Xint or -Xshareclasses:none -Xnoaot are passed to the above command. So, this appears to be a caching and JIT issue.

@hangshao0 and @dsouzai could you take a look?

hangshao0 commented 8 months ago

The trace point j9shr.2334 is there in many cores. In many cases, it more like an event rather than an exception. e.g. By default during the startup phase, we don't change memory protection, so this trace point will always be fired. Can you check the value of javaVM->phase ?

ThanHenderson commented 8 months ago

javaVM->phase is J9VM_PHASE_STARTUP

hangshao0 commented 8 months ago

javaVM->phase is J9VM_PHASE_STARTUP

We will always see j9shr.2334 during startup, which is expected behaviour.

dsouzai commented 8 months ago

I took a look at this today. It looks like the crash will happen even with -Xshareclasses:none "-Xaot:loadExclude={*}", so it isn't an AOT issue. However, I spent some time seeing if I could get any more details out of the issue for the next person.

When I also add -Xjit:exclude={*} the crash doesn't happen anymore. So it seems like it's JIT related. On the other hand, the stack of the crashing thread is always just:

3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/util/stream/Collectors$$Lambda$16/0x23ea5ee8.get$Lambda(Bytecode PC:10)

even when viewing it in jdmpview, and this method isn't compiled, so it doesn't seem like it's JIT related. To further add to the confusion, Running with -Xshareclasses:none "-Xaot:loadExclude={*}" -Xjit:verbose,vlog=limitFile,disableSuffixLogs will result in a crash, but then running with -Xshareclasses:none "-Xaot:loadExclude={*}" "-Xjit:limitFile=(limitfile,1,nnn)" where nnn is the total number of lines in the limit file will not result in a crash, even though we're allowing the compilation of all methods from the crashing run.

fyi @hzongaro

ThanHenderson commented 8 months ago

Just to add a bit more information: running solely with -Xjit:optlevel=hot or veryHot or scorching does not crash.

But noOpt or cold or warm does.

ThanHenderson commented 7 months ago

Briefly looking at this again, it could be a problem with memory protection attributes of a JIT'd region leading to a Data Execution Prevention (DEP) issue. I'll check if I can get a machine with DEP disabled temporarily and see if it can be reproduced then.

... On the other hand, the stack of the crashing thread is always just:

3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at java/util/stream/Collectors$$Lambda$16/0x23ea5ee8.get$Lambda(Bytecode PC:10)

This isn't always the case. I've observed others, but they are always a get$Lambda(Bytecode PC:10) which is a return1 bytecode.

pshipton commented 7 months ago

The previous description makes me think of the following. The generic return bytecode is overwritten with a more specific version. In the shared cache, which is read only, these bytecodes are updated in advance when the class is written.

ThanHenderson commented 7 months ago

If that were the case, I'd expect it to build with -Xshareclasses:none, but it still fails with the same error.

ThanHenderson commented 7 months ago

I reproduced on a machine with DEP disabled and still see the crash:

Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=028F802D ContextFlags=0001007f
Handler1=7142D870 Handler2=747354B0 InaccessibleWriteAddress=222E1000
EDI=2372128C ESI=226F1420 EAX=222E1000 EBX=028F8028
ECX=028ED920 EDX=00CAF880
EIP=028F802D ESP=239BF318 EBP=230F0D00 EFLAGS=00210206
GS=002B FS=0053 ES=002B DS=002B
Module=
Module_base_address=027F0000 Offset_in_DLL=0010802d
Target=2_90_20240209_12 (Windows Server 2022 10.0 build 20348)
CPU=x86 (4 logical CPUs) (0x1fff78000 RAM)
----------- Stack Backtrace -----------
(0x028F802D)
(0x028ED920)
(0x226F1400)
(0x226F1458)
(0x23F78C84)

But note, with DEP disabled, we see an InaccessibleWriteAddress rather than a DEPAddress (https://github.com/eclipse/omr/blob/3f8a0e7134a9a7e2e9f607963f79674535ffe87a/port/win32/omrsignal.c#L620-L631) in the exception output.

Running with DEP disbabled and -Xshareclasses:none still crashes but no longer has a InaccessibleWriteAddress in the output:

Unhandled exception
Type=Illegal instruction vmState=0x00000000
J9Generic_Signal_Number=00000010 ExceptionCode=c000001d ExceptionAddress=0344C6A0 ContextFlags=0001007f
Handler1=7142D870 Handler2=747354B0
EDI=242FB78C ESI=24567600 EAX=23233500 EBX=0344C690
ECX=0344C670 EDX=017BF8D0
EIP=0344C6A0 ESP=246FFB54 EBP=23CD2000 EFLAGS=00210246
GS=002B FS=0053 ES=002B DS=002B
Module=
Module_base_address=033D0000 Offset_in_DLL=0007c6a0
Target=2_90_20240209_12 (Windows Server 2022 10.0 build 20348)
CPU=x86 (4 logical CPUs) (0x1fff78000 RAM)
----------- Stack Backtrace -----------
(0x0344C6A0)
(0x246FFB58)
(0x0344C670)
(0x23229E28)
(0x06AC9171)
(0x27F3BAE4)

Update: The ExceptionCode=c0000005 is EXCEPTION_ACCESS_VIOLATION and ExceptionCode=c000001d is ERROR_WRITE_FAULT.

ThanHenderson commented 7 months ago

From the following (also reported above):

Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=028F802D ContextFlags=0001007f
Handler1=7142D870 Handler2=747354B0 InaccessibleWriteAddress=222E1000
EDI=2372128C ESI=226F1420 EAX=222E1000 EBX=028F8028
ECX=028ED920 EDX=00CAF880
EIP=028F802D ESP=239BF318 EBP=230F0D00 EFLAGS=00210206
GS=002B FS=0053 ES=002B DS=002B
Module=
Module_base_address=027F0000 Offset_in_DLL=0010802d
Target=2_90_20240209_12 (Windows Server 2022 10.0 build 20348)
CPU=x86 (4 logical CPUs) (0x1fff78000 RAM)
----------- Stack Backtrace -----------
(0x028F802D) <--offset 5 within java/lang/invoke/MemberName instance @ 0x28f8028
(0x028ED920) <--java/lang/invoke/DirectMethodHandle
(0x226F1400) <--java/util/Vector$VectorSpliterator
(0x226F1458) <--java/util/stream/Collectors$CollectorImp
(0x23F78C84) <--(JIT'd) java/lang/invoke/LambdaForm$DMH/0x239cefe0::invokeStatic_L3_L(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
> !gpinfo
Failing Thread: !j9vmthread 0x230f0d00
Failing Thread ID: 0x1b24 (6948)
gpInfo:
J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=028F802D ContextFlags=0001007f
Handler1=7142D870 Handler2=747354B0 InaccessibleWriteAddress=222E1000
EDI=2372128C ESI=226F1420 EAX=222E1000 EBX=028F8028
ECX=028ED920 EDX=00CAF880
EIP=028F802D ESP=239BF318 EBP=230F0D00 EFLAGS=00210206
GS=002B FS=0053 ES=002B DS=002B
Module=
Module_base_address=027F0000 Offset_in_DLL=0010802d

> !whatis 0x027F0000
Found 0x027F0000 as !void 0x27f0000: !j9javavm 0x88ea28->omrRuntime->_vmList->_vmThreadList->heapBaseForBarrierRange0
Match found
Searched 3720 fields to a depth of 6 in 958 ms

> info mmap 0x027F0000
Start Address   End Address     Size            Size            Read/Write/Execute
0x027f0000      0x02deffff      0x00600000      (6,291,456)     RW
Name:   Image section @ 27f0000 (6291456 bytes)
        allocationBase=0x027F0000       allocationProtect=PAGE_NOACCESS
        allocationProtect_flags=0x1     executable=false
        protect=PAGE_READWRITE          protect_flags=0x4
        readable=true                   size=0x00600000
        state=MEM_COMMIT                state_flags=0x1000
        type=MEM_PRIVATE                type_flags=0x1
        writable=true

> info heap *

         Heap #1:  Generational@230dd288
          Section #1:  Heap extent at 0x27f0000 (0x600000 bytes)
           Size:        6291456 bytes
           Shared:      false
           Executable:  true
           Read Only:   false
          Section #2:  Heap extent at 0x22530000 (0x200000 bytes)
           Size:        2097152 bytes
           Shared:      false
           Executable:  true
           Read Only:   false
          Section #3:  Heap extent at 0x22730000 (0xc0000 bytes)
           Size:        786432 bytes
           Shared:      false
           Executable:  true
           Read Only:   false
> info mmap 0x222E1000
Start Address   End Address     Size            Size            Read/Write/Execute
0x02df0000      0x2252ffff      0x1f740000      (527,695,872)
Name:   Image section @ 2df0000 (527695872 bytes)
        allocationBase=0x027F0000       allocationProtect=PAGE_NOACCESS
        allocationProtect_flags=0x1     protect=
        protect_flags=0x0               size=0x1F740000
        state=MEM_RESERVE               state_flags=0x2000
        type=MEM_PRIVATE                type_flags=0x1
babsingh commented 5 days ago

This is the last functional issue, which prevents us to enable OJDK MHs in Java 8 for Win 32 platform. @0xdaryl @jdmpapin @nbhuiyan Can the JIT team help with this issue?

ThanHenderson commented 5 days ago

I had looked a bit more into this and reproduced the failure with simple test files that just call the MethodHandle::invoke variants many times.

I was only seeing failures related to the JIT'd java/lang/invoke/LambdaForm$DMH/0x<addr>::invoke... methods (failed reliably with -Xjit:limit set to only compile those methods; reliably completed successfully with -Xjit:exclude of those methods).

However, I was seeing a couple different failures. Most of the time, it failed similarly to the above report. But sometimes, it would fail during the compilation of those methods rather than during execution of the compiled method.

I also tested disabling many (as many as I could) optimizations via the -Xjit:disable... options. I wasn't able to pinpoint blame to any optimization; any compilation of the method resulted in a failure. My initial hunch was that something was awry in the i2j transitioning, but that wouldn't explain the failures during compilation.

Whoever picks this up, reach out to me. I have documented a lot of what I had done to triage and may have some more information that would help.

hzongaro commented 4 days ago

@nbhuiyan, assigning to you as you suggested. Thanks!