Open pshipton opened 9 months ago
@babsingh @ThanHenderson
This failure occurs during the following command:
C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\jdkbinary\j2sdk-image\jre\bin\java.exe
-classpath C:/apache-ant/apache-ant-1.10.5/lib/ant-launcher.jar
-Dant.home=C:/apache-ant/apache-ant-1.10.5
-Dant.library.dir=C:/apache-ant/apache-ant-1.10.5/lib
-Dcygwin.user.home=C:/cygwin64/home/jenkins org.apache.tools.ant.launch.Launcher
-cp "" -f ./scripts/build_tools.xml
-DTEST_JDK_HOME=C:/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/jdkbinary/j2sdk-image
-DTEST_ROOT=/cygdrive/c/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/aqa-tests/TKG/..
-DLIB_DIR=C:/Users/jenkins/workspace/Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal/../../testDependency/lib
Using jdmpview
we see the exception is from here:
> !snapformat -t 0x221c0f00
⋮
0x221C0F00 j9vm.682 Entry >WalkStackFrames - walkThread=0x221c0f00 flags=0x2c0000 sp=0x228f19f4 a0=0x228f1a08 pc=0x3 literals=0x0 els=0x13cfd4c j2i=0x0
0x221C0F00 j9vm.683 Exit <WalkStackFrames - walkThread=0x221c0f00, rc=0
0x221C0F00 j9shr.2334 Exception * CC changePartialPageProtection: Returning without changing page protection for address 0x25201540 to read-write
0x221C0F00 j9shr.2334 Exception * CC changePartialPageProtection: Returning without changing page protection for address 0x24829ee8 to read-write
0x221C0F00 omrport.256 Exception * omrfile_open returns failure, filename = C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\jdkbinary\j2sdk-image\jre\bin\default\java_en_US.properties, os errno = 2, portable errno = -108
0x221C0F00 omrport.256 Exception * omrfile_open returns failure, filename = C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\jdkbinary\j2sdk-image\jre\bin\default\java_en.properties, os errno = 2, portable errno = -108
0x221C0F00 j9dmp.9 Event Preparing for dump, filename=C:\Users\jenkins\workspace\Test_openjdk8_j9_sanity.functional_x86-32_windows_ojdk292_Personal\aqa-tests\TKG\core.20240116.071021.2576.0001.dmp
However, this does not occur when either -Xint
or -Xshareclasses:none -Xnoaot
are passed to the above command. So, this appears to be a caching and JIT issue.
@hangshao0 and @dsouzai could you take a look?
The trace point j9shr.2334
is there in many cores. In many cases, it more like an event rather than an exception. e.g. By default during the startup phase, we don't change memory protection, so this trace point will always be fired. Can you check the value of javaVM->phase
?
javaVM->phase
is J9VM_PHASE_STARTUP
javaVM->phase is J9VM_PHASE_STARTUP
We will always see j9shr.2334
during startup, which is expected behaviour.
I took a look at this today. It looks like the crash will happen even with -Xshareclasses:none "-Xaot:loadExclude={*}"
, so it isn't an AOT issue. However, I spent some time seeing if I could get any more details out of the issue for the next person.
When I also add -Xjit:exclude={*}
the crash doesn't happen anymore. So it seems like it's JIT related. On the other hand, the stack of the crashing thread is always just:
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at java/util/stream/Collectors$$Lambda$16/0x23ea5ee8.get$Lambda(Bytecode PC:10)
even when viewing it in jdmpview, and this method isn't compiled, so it doesn't seem like it's JIT related. To further add to the confusion, Running with -Xshareclasses:none "-Xaot:loadExclude={*}" -Xjit:verbose,vlog=limitFile,disableSuffixLogs
will result in a crash, but then running with -Xshareclasses:none "-Xaot:loadExclude={*}" "-Xjit:limitFile=(limitfile,1,nnn)"
where nnn
is the total number of lines in the limit file will not result in a crash, even though we're allowing the compilation of all methods from the crashing run.
fyi @hzongaro
Just to add a bit more information: running solely with -Xjit:optlevel=hot
or veryHot
or scorching
does not crash.
But noOpt
or cold
or warm
does.
Briefly looking at this again, it could be a problem with memory protection attributes of a JIT'd region leading to a Data Execution Prevention (DEP) issue. I'll check if I can get a machine with DEP disabled temporarily and see if it can be reproduced then.
... On the other hand, the stack of the crashing thread is always just:
3XMTHREADINFO3 Java callstack: 4XESTACKTRACE at java/util/stream/Collectors$$Lambda$16/0x23ea5ee8.get$Lambda(Bytecode PC:10)
This isn't always the case. I've observed others, but they are always a get$Lambda(Bytecode PC:10)
which is a return1
bytecode.
The previous description makes me think of the following. The generic return bytecode is overwritten with a more specific version. In the shared cache, which is read only, these bytecodes are updated in advance when the class is written.
If that were the case, I'd expect it to build with -Xshareclasses:none
, but it still fails with the same error.
I reproduced on a machine with DEP disabled and still see the crash:
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=028F802D ContextFlags=0001007f
Handler1=7142D870 Handler2=747354B0 InaccessibleWriteAddress=222E1000
EDI=2372128C ESI=226F1420 EAX=222E1000 EBX=028F8028
ECX=028ED920 EDX=00CAF880
EIP=028F802D ESP=239BF318 EBP=230F0D00 EFLAGS=00210206
GS=002B FS=0053 ES=002B DS=002B
Module=
Module_base_address=027F0000 Offset_in_DLL=0010802d
Target=2_90_20240209_12 (Windows Server 2022 10.0 build 20348)
CPU=x86 (4 logical CPUs) (0x1fff78000 RAM)
----------- Stack Backtrace -----------
(0x028F802D)
(0x028ED920)
(0x226F1400)
(0x226F1458)
(0x23F78C84)
But note, with DEP disabled, we see an InaccessibleWriteAddress
rather than a DEPAddress
(https://github.com/eclipse/omr/blob/3f8a0e7134a9a7e2e9f607963f79674535ffe87a/port/win32/omrsignal.c#L620-L631) in the exception output.
Running with DEP disbabled and -Xshareclasses:none
still crashes but no longer has a InaccessibleWriteAddress
in the output:
Unhandled exception
Type=Illegal instruction vmState=0x00000000
J9Generic_Signal_Number=00000010 ExceptionCode=c000001d ExceptionAddress=0344C6A0 ContextFlags=0001007f
Handler1=7142D870 Handler2=747354B0
EDI=242FB78C ESI=24567600 EAX=23233500 EBX=0344C690
ECX=0344C670 EDX=017BF8D0
EIP=0344C6A0 ESP=246FFB54 EBP=23CD2000 EFLAGS=00210246
GS=002B FS=0053 ES=002B DS=002B
Module=
Module_base_address=033D0000 Offset_in_DLL=0007c6a0
Target=2_90_20240209_12 (Windows Server 2022 10.0 build 20348)
CPU=x86 (4 logical CPUs) (0x1fff78000 RAM)
----------- Stack Backtrace -----------
(0x0344C6A0)
(0x246FFB58)
(0x0344C670)
(0x23229E28)
(0x06AC9171)
(0x27F3BAE4)
Update: The ExceptionCode=c0000005
is EXCEPTION_ACCESS_VIOLATION
and ExceptionCode=c000001d
is ERROR_WRITE_FAULT
.
From the following (also reported above):
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=028F802D ContextFlags=0001007f
Handler1=7142D870 Handler2=747354B0 InaccessibleWriteAddress=222E1000
EDI=2372128C ESI=226F1420 EAX=222E1000 EBX=028F8028
ECX=028ED920 EDX=00CAF880
EIP=028F802D ESP=239BF318 EBP=230F0D00 EFLAGS=00210206
GS=002B FS=0053 ES=002B DS=002B
Module=
Module_base_address=027F0000 Offset_in_DLL=0010802d
Target=2_90_20240209_12 (Windows Server 2022 10.0 build 20348)
CPU=x86 (4 logical CPUs) (0x1fff78000 RAM)
----------- Stack Backtrace -----------
(0x028F802D) <--offset 5 within java/lang/invoke/MemberName instance @ 0x28f8028
(0x028ED920) <--java/lang/invoke/DirectMethodHandle
(0x226F1400) <--java/util/Vector$VectorSpliterator
(0x226F1458) <--java/util/stream/Collectors$CollectorImp
(0x23F78C84) <--(JIT'd) java/lang/invoke/LambdaForm$DMH/0x239cefe0::invokeStatic_L3_L(Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
> !gpinfo
Failing Thread: !j9vmthread 0x230f0d00
Failing Thread ID: 0x1b24 (6948)
gpInfo:
J9Generic_Signal_Number=00000004 ExceptionCode=c0000005 ExceptionAddress=028F802D ContextFlags=0001007f
Handler1=7142D870 Handler2=747354B0 InaccessibleWriteAddress=222E1000
EDI=2372128C ESI=226F1420 EAX=222E1000 EBX=028F8028
ECX=028ED920 EDX=00CAF880
EIP=028F802D ESP=239BF318 EBP=230F0D00 EFLAGS=00210206
GS=002B FS=0053 ES=002B DS=002B
Module=
Module_base_address=027F0000 Offset_in_DLL=0010802d
> !whatis 0x027F0000
Found 0x027F0000 as !void 0x27f0000: !j9javavm 0x88ea28->omrRuntime->_vmList->_vmThreadList->heapBaseForBarrierRange0
Match found
Searched 3720 fields to a depth of 6 in 958 ms
> info mmap 0x027F0000
Start Address End Address Size Size Read/Write/Execute
0x027f0000 0x02deffff 0x00600000 (6,291,456) RW
Name: Image section @ 27f0000 (6291456 bytes)
allocationBase=0x027F0000 allocationProtect=PAGE_NOACCESS
allocationProtect_flags=0x1 executable=false
protect=PAGE_READWRITE protect_flags=0x4
readable=true size=0x00600000
state=MEM_COMMIT state_flags=0x1000
type=MEM_PRIVATE type_flags=0x1
writable=true
> info heap *
Heap #1: Generational@230dd288
Section #1: Heap extent at 0x27f0000 (0x600000 bytes)
Size: 6291456 bytes
Shared: false
Executable: true
Read Only: false
Section #2: Heap extent at 0x22530000 (0x200000 bytes)
Size: 2097152 bytes
Shared: false
Executable: true
Read Only: false
Section #3: Heap extent at 0x22730000 (0xc0000 bytes)
Size: 786432 bytes
Shared: false
Executable: true
Read Only: false
> info mmap 0x222E1000
Start Address End Address Size Size Read/Write/Execute
0x02df0000 0x2252ffff 0x1f740000 (527,695,872)
Name: Image section @ 2df0000 (527695872 bytes)
allocationBase=0x027F0000 allocationProtect=PAGE_NOACCESS
allocationProtect_flags=0x1 protect=
protect_flags=0x0 size=0x1F740000
state=MEM_RESERVE state_flags=0x2000
type=MEM_PRIVATE type_flags=0x1
This is the last functional issue, which prevents us to enable OJDK MHs in Java 8 for Win 32 platform. @0xdaryl @jdmpapin @nbhuiyan Can the JIT team help with this issue?
-Xint
or -Xshareclasses:none -Xnoaot
are passed.-Xjit:optlevel=hot
or veryHot
or scorching
do not crash. But noOpt
or cold
or warm
crash.I had looked a bit more into this and reproduced the failure with simple test files that just call the MethodHandle::invoke
variants many times.
I was only seeing failures related to the JIT'd java/lang/invoke/LambdaForm$DMH/0x<addr>::invoke...
methods (failed reliably with -Xjit:limit
set to only compile those methods; reliably completed successfully with -Xjit:exclude
of those methods).
However, I was seeing a couple different failures. Most of the time, it failed similarly to the above report. But sometimes, it would fail during the compilation of those methods rather than during execution of the compiled method.
I also tested disabling many (as many as I could) optimizations via the -Xjit:disable...
options. I wasn't able to pinpoint blame to any optimization; any compilation of the method resulted in a failure. My initial hunch was that something was awry in the i2j transitioning, but that wouldn't explain the failures during compilation.
Whoever picks this up, reach out to me. I have documented a lot of what I had done to triage and may have some more information that would help.
@nbhuiyan, assigning to you as you suggested. Thanks!
Internal build - head stream with OpenJDK method handles enabled win22x86-svl-rt4-1
Diagnostics were created but not archived.