eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.23k stars 712 forks source link

JITServer special.system test failures: malloc/memory corruption errors #15751

Open cjjdespres opened 1 year ago

cjjdespres commented 1 year ago

From

failures in

All of them are memory corruption/malloc errors. A few console logs:

===============================================
Running test DaaLoadTest_all_special_5m_27 ...
===============================================
DaaLoadTest_all_special_5m_27 Start Time: Sun Aug 14 11:07:21 2022 Epoch Time (ms): 1660475241208
variation: Mode612-OSRG
JVM_OPTIONS: -XX:+UseJITServer -Xcompressedrefs -Xgcpolicy:gencon -Xjit:enableOSR,enableOSROnGuardFailure,count=1,disableAsyncCompilation

STF 11:07:24.084 - Java version
STF 11:07:24.084 - Running: /home/jenkins/workspace/Test_openjdk17_j9_special.system_ppc64le_linux_jit_Personal_testList_1/openjdkbinary/j2sdk-image/bin/java -version
openjdk version "17.0.5-internal" 2022-10-18
OpenJDK Runtime Environment (build 17.0.5-internal+0-adhoc.jenkins.BuildJDK17ppc64lelinuxjitPersonal)
Eclipse OpenJ9 VM (build master-cde425e5de0, JRE 17 Linux ppc64le-64-Bit Compressed References 20220814_365 (JIT enabled, AOT enabled)
OpenJ9   - cde425e5de0
OMR      - 33b7bc88331
JCL      - d5540d6c583 based on jdk-17.0.5+1)

STF 11:07:24.187 - Monitoring processes: DLT
DLT stderr malloc(): invalid size (unsorted)
===============================================
Running test MauveMultiThrdLoad_special_5m_13 ...
===============================================
MauveMultiThrdLoad_special_5m_13 Start Time: Sun Aug 14 06:21:56 2022 Epoch Time (ms): 1660483316730
variation: Mode553
JVM_OPTIONS: -XX:+UseJITServer -XX:+UseCompressedOops -Xgcpolicy:balanced -Xjit:count=0

STF 06:22:01.616 - Java version
STF 06:22:01.616 - Running: /home/jenkins/workspace/Test_openjdk8_j9_special.system_ppc64le_linux_jit_Personal_testList_2/openjdkbinary/j2sdk-image/bin/java -version
openjdk version "1.8.0_352-internal"
OpenJDK Runtime Environment (build 1.8.0_352-internal-jenkins_2022_08_13_23_57-b00)
Eclipse OpenJ9 VM (build master-cde425e5de0, JRE 1.8.0 Linux ppc64le-64-Bit Compressed References 20220814_1092 (JIT enabled, AOT enabled)
OpenJ9   - cde425e5de0
OMR      - 33b7bc88331
JCL      - 8287678f5d4 based on jdk8u352-b01)

LT  06:22:58.638 - Starting thread. Suite=0 thread=0
LT  06:22:58.758 - Starting thread. Suite=0 thread=1
LT  06:22:58.766 - Starting thread. Suite=0 thread=2
LT  stderr java: malloc.c:2399: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t)) < __alignof__ (long double) ? __alignof__ (long double) : 2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t)) < __alignof__ (long double) ? __alignof__ (long double) : 2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
===============================================
Running test LambdaLoadTest_special_J9_5m_25 ...
===============================================
LambdaLoadTest_special_J9_5m_25 Start Time: Sat Aug 13 05:02:19 2022 Epoch Time (ms): 1660392139172
variation: Mode110-OSRG
JVM_OPTIONS: -XX:+UseJITServer -Xjit:enableOSR,enableOSROnGuardFailure,count=1,disableAsyncCompilation -Xgcpolicy:gencon

STF 05:02:22.764 - Java version
STF 05:02:22.764 - Running: /home/jenkins/workspace/Test_openjdk8_j9_special.system_ppc64le_linux_jit_Personal_testList_3/openjdkbinary/j2sdk-image/bin/java -version
openjdk version "1.8.0_352-internal"
OpenJDK Runtime Environment (build 1.8.0_352-internal-jenkins_2022_08_13_00_02-b00)
Eclipse OpenJ9 VM (build master-cde425e5de0, JRE 1.8.0 Linux ppc64le-64-Bit Compressed References 20220813_1091 (JIT enabled, AOT enabled)
OpenJ9   - cde425e5de0
OMR      - 33b7bc88331
JCL      - 8287678f5d4 based on jdk8u352-b01)

LT  05:03:07.486 - Starting thread. Suite=0 thread=0
LT  05:03:07.498 - Starting thread. Suite=0 thread=1
LT  stderr *** Error in `/home/jenkins/workspace/Test_openjdk8_j9_special.system_ppc64le_linux_jit_Personal_testList_3/openjdkbinary/j2sdk-image/bin/java': malloc(): memory corruption: 0x00003fff4c0516f0 ***
cjjdespres commented 1 year ago

Attn @mpirvu. Perhaps they're related to #15601, given the timing?

mpirvu commented 1 year ago

That's a fair number of failures. Do these runs have the latest changes related to localSyncCompiles? If we suspect that localSyncCompiles are at fault we can grind with "-XX:-JITServerLocalSyncCompiles" and see if the failures are still present.

cjjdespres commented 1 year ago

No, the latest changes haven't been picked up in the nightly builds yet. I'll run some grinders to see if the problem is sensitive to LocalSyncCompiles.

cjjdespres commented 1 year ago

I haven't seen any failures in the nightly tests since the latest LocalSyncCompiles changes were picked up, though it's only been a day. I've also run tests of DaaLoadTest_all_special_5m_27 on ppc64le_linux with JDK17 and found that the memory corruption failures happen only when JITServerLocalSyncCompiles is enabled, both before and after the latest changes (with a 2/50 failure rate), and not when it is disabled.

mpirvu commented 1 year ago

Could you please add verbose logs at the client and try to reproduce on pLinux. -Xjit:verbose={compil*},vlog=vlog.txt

cjjdespres commented 1 year ago

It doesn't seem to be reproducible with verbose logs on, like the segfaults.