eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

MiniMix 5m test fails with Segmentation error vmState=0x00000000 on AIX JDK11 OpenJ9 build #12120

Open Mesbah-Alam opened 3 years ago

Mesbah-Alam commented 3 years ago

Test

MixedLoadTest -test-args="timeLimit=30m" Variation: Mode554 JVM_OPTIONS: --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED -Xcompressedrefs -Xgcpolicy:balanced -XXgc:tarokEnableExpensiveAssertions,fvtest_tarokPGCRotateCollectors -Xjit:count=0,optlevel=hot,gcOnResolve,rtResolve

Java Version

13:51:42  openjdk version "11.0.11" 2021-04-20
13:51:42  OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.11+4-202103021928)
13:51:42  Eclipse OpenJ9 VM AdoptOpenJDK (build master-9123f9b8a, JRE 11 AIX ppc64-64-Bit Compressed References 20210302_932 (JIT enabled, AOT enabled)
13:51:42  OpenJ9   - 9123f9b8a
13:51:42  OMR      - eb4f5a875
13:51:42  JCL      - 7222dc7018 based on jdk-11.0.11+4)

Error details copied from console output

14:03:39  LT  stderr Unhandled exception
14:03:39  LT  stderr Type=Segmentation error vmState=0x00000000
14:03:39  LT  stderr J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000033
14:03:39  LT  stderr Handler1=09001000A0E1E3E0 Handler2=09001000A0DDF2C0
14:03:39  LT  stderr R0=0000000000000020 R1=000001002EC71E70 R2=08001000A01A3348 R3=000001001EA159B0
14:03:39  LT  stderr R4=000001001B41A2B8 R5=0000000000000000 R6=0000000000000000 R7=000001001B41B07C
14:03:39  LT  stderr R8=0000000000000001 R9=0000000000000001 R10=000000003002D868 R11=000001001E8EBE24
14:03:39  LT  stderr R12=0000000000000000 R13=000001002EC7D800 R14=00000000305742E0 R15=000000003052D900
14:03:39  LT  stderr R16=000001002C85A998 R17=FFFFFFFFFFFFFFFF R18=0A000100088E0400 R19=00000003F7233F20
14:03:39  LT  stderr R20=0000000408F2AE50 R21=00000000002C0000 R22=0000000048000001 R23=0000000000000000
14:03:39  LT  stderr R24=FFFFFEFFE17141DC R25=0000000000000000 R26=0000010034C2A390 R27=0000000000000001
14:03:39  LT  stderr R28=000001001B41B07C R29=0000000000000000 R30=000001001EA15990 R31=000000003002D868
14:03:39  LT  stderr IAR=090000001287D8F8 LR=090000001287D8B4 MSR=A00000000200D032 CTR=090000001287CF80
14:03:39  LT  stderr CR=4200024B00000001 FPSCR=BF80010000000000 XER=00000001BF800100
14:03:39  LT  stderr FPR0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR1 c3e0000000000000 (f: 0.000000, d: -9.223372e+18)
14:03:39  LT  stderr FPR2 41cdcd6500000000 (f: 0.000000, d: 1.000000e+09)
14:03:39  LT  stderr FPR3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR5 c3e0000000000000 (f: 0.000000, d: -9.223372e+18)
14:03:39  LT  stderr FPR6 3ee56d022b4f4c83 (f: 726617216.000000, d: 1.021662e-05)
14:03:39  LT  stderr FPR7 412e848000000000 (f: 0.000000, d: 1.000000e+06)
14:03:39  LT  stderr FPR8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR9 4530000000000000 (f: 0.000000, d: 1.934281e+25)
14:03:39  LT  stderr FPR10 412e848000000000 (f: 0.000000, d: 1.000000e+06)
14:03:39  LT  stderr FPR11 43300000000f4240 (f: 1000000.000000, d: 4.503600e+15)
14:03:39  LT  stderr FPR12 4530000000000000 (f: 0.000000, d: 1.934281e+25)
14:03:39  LT  stderr FPR13 3ff167e75c644e2f (f: 1550077440.000000, d: 1.087867e+00)
14:03:39  LT  stderr FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
14:03:39  LT  stderr Module=/home/jenkins/workspace/Grinder_SM/openjdkbinary/j2sdk-image/lib/default/libj9jit29.so
14:03:39  LT  stderr Module_base_address=0900000012641000
14:03:39  LT  stderr Target=2_90_20210302_932 (AIX 7.2)
14:03:39  LT  stderr CPU=ppc64 (24 logical CPUs) (0x1000000000 RAM)
14:03:39  LT  stderr ----------- Stack Backtrace -----------
14:03:39  LT  stderr replaceTrampoline__Q2_3OMR16CodeCacheManagerFP20TR_OpaqueMethodBlockPvN32b+0x100 (0x090000001287D784 [libj9jit29.so+0x23c784])
14:03:39  LT  stderr mcc_replaceTrampoline+0x3c (0x090000001287D5C0 [libj9jit29.so+0x23c5c0])
14:03:39  LT  stderr ppcCodePatching__FPvN51+0x364 (0x090000001287D2E8 [libj9jit29.so+0x23c2e8])
14:03:39  LT  stderr patchCallPoint__Q2_3OMR9CodeCacheFP20TR_OpaqueMethodBlockPvN22+0x128 (0x090000001272622C [libj9jit29.so+0xe522c])
14:03:39  LT  stderr mcc_callPointPatching_unwrapper+0x40 (0x0900000012721EC4 [libj9jit29.so+0xe0ec4])
14:03:39  LT  stderr old_slow_jitCallCFunction+0x50 (0x0900000012702954 [libj9jit29.so+0xc1954])
14:03:39  LT  stderr (0x09000000126ED1EC [libj9jit29.so+0xac1ec])
14:03:39  LT  stderr sidecarInvokeReflectMethodImpl+0x274 (0x090000000FE41DD8 [libj9vm29.so+0x72dd8])
14:03:39  LT  stderr sidecarInvokeReflectMethod+0x30 (0x090000000FE44474 [libj9vm29.so+0x75474])
14:03:39  LT  stderr JVM_InvokeMethod_Impl+0xb0 (0x09000000068EE114 [libjclse29.so+0x90114])
14:03:39  LT  stderr JVM_InvokeMethod+0x64 (0x0900000005C21568 [libjvm.so+0x15568])
14:03:39  LT  stderr JVM_InvokeMethod+0x68 (0x0900000005BDC5CC [libjvm.so+0x65cc])
14:03:39  LT  stderr JVM_InvokeMethod+0x68 (0x090000000568E5CC [libjvm.so+0x65cc])
14:03:39  LT  stderr Java_jdk_internal_reflect_NativeMethodAccessorImpl_invoke0+0x18 (0x090000000582F21C [libjava.so+0x1021c])
14:03:39  LT  stderr (0x000001001B6A8E30)
14:03:39  LT  stderr runJavaThread+0x1b4 (0x090000000FE42CB8 [libj9vm29.so+0x73cb8])
14:03:39  LT  stderr javaProtectedThreadProc+0x128 (0x090000000FDD1F0C [libj9vm29.so+0x2f0c])
14:03:39  LT  stderr omrsig_protect+0x488 (0x0900000005CCAE4C [libj9prt29.so+0x59e4c])
14:03:39  LT  stderr javaThreadProc+0x68 (0x090000000FDD1CEC [libj9vm29.so+0x2cec])
14:03:39  LT  stderr thread_wrapper+0x33c (0x090000000574B760 [libj9thr29.so+0x4760])
14:03:39  LT  stderr _pthread_body+0xe8 (0x090000000059DFEC [libpthreads.a+0x3fec])
14:03:39  LT  stderr ---------------------------------------
14:03:39  LT  stderr JVMDUMP039I Processing dump event "gpf", detail "" at 2021/03/03 13:03:39 - please wait.
14:03:39  LT  stderr JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16147981351093/system_custom_0/20210303-130215-MixedLoadTest/results/core.20210303.130339.30147054.0001.dmp' in response to an event
14:03:39  LT  stderr JVMDUMP012E Error in System dump: cannot find core file: "A file or directory in the path name does not exist.". check "ulimit -Hc" is set high enough
14:03:39  LT  stderr JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16147981351093/system_custom_0/20210303-130215-MixedLoadTest/results/javacore.20210303.130339.30147054.0002.txt' in response to an event
14:03:39  STF 13:03:39.636 - Found dump at: /home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16147981351093/system_custom_0/20210303-130215-MixedLoadTest/results/javacore.20210303.130339.30147054.0002.txt
14:03:40  LT  stderr javacore file generated - /home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16147981351093/system_custom_0/20210303-130215-MixedLoadTest/results/javacore.20210303.130339.30147054.0002.txt
14:03:40  LT  stderr 
14:03:40  LT  stderr 
14:03:40  LT  stderr *** Invalid JIT return address 0000000030D69F40 in 000001002EC63AA0
14:03:40  LT  stderr 
14:04:02  LT  stderr JVMDUMP010I Java dump written to /home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16147981351093/system_custom_0/20210303-130215-MixedLoadTest/results/javacore.20210303.130339.30147054.0002.txt

To reproduce

Run grinder with following settings:

PLATFORM=ppc64_aix
BUILD_LIST=system
TARGET=system_custom
CUSTOM_TARGET=-test=MixedLoadTest -test-args="timeLimit=5m"
SDK_RESOURC=nightly
JDK_VERSION=11
JDK_IMPL=openj9
EXTRA_OPTIONS= --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED -Xcompressedrefs -Xgcpolicy:balanced -XXgc:tarokEnableExpensiveAssertions,fvtest_tarokPGCRotateCollectors -Xjit:count=0,optlevel=hot,gcOnResolve,rtResolve

(Please accept default values for all the rest of the fields) 
Mesbah-Alam commented 3 years ago

Started a 5x grinder at internal Jenkins, Grinder/14023/

pshipton commented 3 years ago

@zl-wang fyi

zl-wang commented 3 years ago

interesting ... crash in replaceTrampoline

Mesbah-Alam commented 3 years ago

@zl-wang : Please note, the problem was reproduced 3/5 times in hyc-runtimes-jenkins/Grinder/14023 .

Mesbah-Alam commented 3 years ago

@zl-wang : Just an FYI..

Another JIT crash is observed while running the same test in Aarch64 as well :

https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder_SM/43/tapResults/

Core file: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder_SM/43/artifact/system_test_output.tar.gz

Please advise if this requires a different issue.

Error details :

LT  stderr Unhandled exception
LT  stderr Type=Segmentation error vmState=0x00000000
LT  stderr J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
LT  stderr Handler1=0000FFFF9AE9C8C0 Handler2=0000FFFF9AE143F0 InaccessibleAddress=0000000000000020
LT  stderr R0=0000FFFF76FC64D0 R1=0000FFFF76FC64E0 R2=0000FFFF700A1860 R3=0000FFFF70170BC0
LT  stderr R4=0000FFFF81AA6D7C R5=0000000000000000 R6=0000000000000000 R7=0000FFFF9AB077E0
LT  stderr R8=0000FFFF75859F30 R9=FFFFFFFFFFFFFFB0 R10=0000FFFF81AA6D7C R11=0000FFFF76ED5E5C
LT  stderr R12=7420303D65746975 R13=0000000000000038 R14=000000000000003F R15=0000FFFF4F5F5520
LT  stderr R16=0000FFFF9ADEA118 R17=0000FFFF9BCEB428 R18=0000000000000001 R19=0000FFFF76FC64D0
LT  stderr R20=0000FFFF700A1860 R21=0000000000000000 R22=0000FFFF81AA6D7C R23=0000000000000000
LT  stderr R24=0000FFFF81AA6D80 R25=0000000000000000 R26=000000068065C8E8 R27=0000000000000000
LT  stderr R28=0000000000000009 R29=0000000000000000 R30=0000FFFF9A86E254 R31=0000FFFF4F5F5230
LT  stderr PC=0000FFFF9A86E258 SP=0000FFFF4F5F5230 PSTATE=0000000020000000
LT  stderr V0 0000000000000005 (f: 5.000000, d: 2.470328e-323)
LT  stderr V1 000000003f800000 (f: 1065353216.000000, d: 5.263544e-315)
LT  stderr V2 401ffd8ad3e009d4 (f: 3554675200.000000, d: 7.997600e+00)
LT  stderr V3 3fe62e42fee00000 (f: 4276092928.000000, d: 6.931472e-01)
LT  stderr V4 3d60d81cd340e21c (f: 3544244736.000000, d: 4.787407e-13)
LT  stderr V5 3ea737781f0259fa (f: 520247808.000000, d: 6.919109e-07)
LT  stderr V6 40162e42fee00000 (f: 4276092928.000000, d: 5.545177e+00)
LT  stderr V7 3fe5555555555593 (f: 1431655808.000000, d: 6.666667e-01)
LT  stderr V8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
LT  stderr V9 3f50624dd2f1a9fc (f: 3539053056.000000, d: 1.000000e-03)
LT  stderr V10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
LT  stderr V11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
LT  stderr V12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
LT  stderr V13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
LT  stderr V14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
LT  stderr V15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
LT  stderr V16 3fd999999997fa04 (f: 2576873984.000000, d: 4.000000e-01)
LT  stderr V17 3fd2492494229359 (f: 2485293824.000000, d: 2.857143e-01)
LT  stderr V18 3fcc71c51d8e78af (f: 495876256.000000, d: 2.222220e-01)
LT  stderr V19 3fc7466496cb03de (f: 2529887232.000000, d: 1.818357e-01)
LT  stderr V20 0000ffff4f5f53d0 (f: 1331647488.000000, d: 1.390657e-309)
LT  stderr V21 0000ffff4f5f53c0 (f: 1331647488.000000, d: 1.390657e-309)
LT  stderr V22 0000ffff4f5f53b0 (f: 1331647360.000000, d: 1.390657e-309)
LT  stderr V23 00000000000000f0 (f: 240.000000, d: 1.185758e-321)
LT  stderr V24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
LT  stderr V25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
LT  stderr V26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
LT  stderr V27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
LT  stderr V28 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
LT  stderr V29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
LT  stderr V30 31f54fdd7f73bf3b (f: 2138291968.000000, d: 4.940656e-68)
LT  stderr V31 000000007f800000 (f: 2139095040.000000, d: 1.056853e-314)
LT  stderr Module=/home/jenkins/workspace/Grinder_SM/openjdkbinary/j2sdk-image/lib/default/libj9jit29.so
LT  stderr Module_base_address=0000FFFF9A161000
LT  stderr Target=2_90_20210302_677 (Linux 4.15.0-34-generic)
LT  stderr CPU=aarch64 (64 logical CPUs) (0x1f699a1000 RAM)
LT  stderr ----------- Stack Backtrace -----------
LT  stderr (0x0000FFFF9AE11048 [libj9prt29.so+0x26048])
LT  stderr ---------------------------------------
LT  stderr JVMDUMP039I Processing dump event "gpf", detail "" at 2021/03/04 02:10:45 - please wait.
LT  stderr JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16148166009214/system_custom_0/20210304-001000-MixedLoadTest/results/core.20210304.021045.8957.0001.dmp' in response to an event
LT  02:10:47.353 - Thread completed. Suite=0 thread=1
LT  02:10:51.124 - Completed 100.0%. Number of tests started=1028103 (+1546)
LT  02:10:52.629 - Thread completed. Suite=0 thread=9
LT  stderr JVMDUMP010I System dump written to /home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16148166009214/system_custom_0/20210304-001000-MixedLoadTest/results/core.20210304.021045.8957.0001.dmp
LT  stderr JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16148166009214/system_custom_0/20210304-001000-MixedLoadTest/results/javacore.20210304.021045.8957.0002.txt' in response to an event
STF 02:10:55.692 - Found dump at: /home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16148166009214/system_custom_0/20210304-001000-MixedLoadTest/results/javacore.20210304.021045.8957.0002.txt
STF 02:10:55.694 - Found dump at: /home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16148166009214/system_custom_0/20210304-001000-MixedLoadTest/results/core.20210304.021045.8957.0001.dmp
LT  stderr javacore file generated - /home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16148166009214/system_custom_0/20210304-001000-MixedLoadTest/results/javacore.20210304.021045.8957.0002.txt
LT  stderr core file generated - /home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16148166009214/system_custom_0/20210304-001000-MixedLoadTest/results/core.20210304.021045.8957.0001.dmp
LT  02:11:03.601 - Thread completed. Suite=0 thread=4
LT  02:11:08.099 - Thread completed. Suite=0 thread=3
LT  stderr JVMDUMP010I Java dump written to /home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16148166009214/system_custom_0/20210304-001000-MixedLoadTest/results/javacore.20210304.021045.8957.0002.txt
LT  stderr JVMDUMP032I JVM requested Snap dump using '/home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16148166009214/system_custom_0/20210304-001000-MixedLoadTest/results/Snap.20210304.021045.8957.0003.trc' in response to an event
LT  stderr JVMDUMP010I Snap dump written to /home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16148166009214/system_custom_0/20210304-001000-MixedLoadTest/results/Snap.20210304.021045.8957.0003.trc
LT  stderr JVMDUMP032I JVM requested JIT dump using '/home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16148166009214/system_custom_0/20210304-001000-MixedLoadTest/results/jitdump.20210304.021045.8957.0004.dmp' in response to an event
LT  stderr JVMDUMP013I Processed dump event "gpf", detail "".
STF 02:11:09.232 - Found dump at: /home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16148166009214/system_custom_0/20210304-001000-MixedLoadTest/results/Snap.20210304.021045.8957.0003.trc
zl-wang commented 3 years ago

Separate issue please ... I didn't see this one's back trace, and thus not sure if it crashed at similar place.

IBMJimmyk commented 3 years ago

Core: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder_SM/38/artifact/system_test_output.tar.gz

There's no corefile in here. From the test output I see: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder_SM/38/consoleText

[2021-03-03T19:01:59.906Z] LT  stderr JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Grinder_SM/openjdk-tests/TKG/output_16147980324343/system_custom_0/20210303-130033-MixedLoadTest/results/core.20210303.130158.30147044.0001.dmp' in response to an event
[2021-03-03T19:01:59.906Z] LT  stderr JVMDUMP012E Error in System dump: cannot find core file: "A file or directory in the path name does not exist.". check "ulimit -Hc" is set high enough

So it seems like there's a problem with the machine config that is preventing the generation of a corefile.

Mesbah-Alam commented 3 years ago

@IBMJimmyk :

You are right, the core file was missing in that zip.

Could you please visit the internal Grinder/14023/console ?

If you scroll down to the bottom, you will see a link that ends with Grinder/14023/system_test_output.tar.gz , that will contain the core files of this defect too.

IBMJimmyk commented 3 years ago

The problem is occuring due to a crash inside of OMR::CodeCache::replaceTrampoline. Near the start of replaceTrampoline is a lookup into _resolvedMethodHT: https://github.com/eclipse/omr/blob/165a4a81709504806f14bd8e540d97962cafe5f5/compiler/runtime/OMRCodeCache.cpp#L614

The method can't be found in _resolvedMethodHT so NULL is returned and stored to entry. A bit later on there are load/stores after dereferencing entry which causes the crash.

The method is supposed to get added to _resolvedMethodHT from inside J9::CodeCache::resolveHashEntry: https://github.com/eclipse/openj9/blob/e749042001fcb7e9b20f74d20e7f85a12538d476/runtime/compiler/runtime/J9CodeCache.cpp#L299

But, this never happens. This reason is that J9::CodeCache::adjustTrampolineReservation doesn't call resolveHashEntry: https://github.com/eclipse/openj9/blob/e749042001fcb7e9b20f74d20e7f85a12538d476/runtime/compiler/runtime/J9CodeCache.cpp#L569-L587

If you look at those lines resolveHashEntry is called if the method is found in _unresolvedMethodHT but is not inside _resolvedMethodHT. As it turns out, the lookup into _unresolvedMethodHT also can't find the entry. But this turns out to be a mistake due to a slight difference in how the method was added to the hashtable and how the lookup was performed.

The lookup inside adjustTrampolineReservation uses the the Constant Pool pointer cp and the Constant Pool Index cpIndex to do the look up. cpIndex has the type int32_t but it was orginally passed in as an unsigned 18-bit value. That's well within the positive range of an int32_t. So here, cpIndex is positive.

Example values:

cp = 0x1814EC0
cpIndex = 0x20000    // decimal value of 131072
key = 0x3029d800000  // generated by running a hash function on the cp and cpIndex

The problem is that when the entry was added to _unresolvedMethodHT, a negative cpIndex was used.

The cpIndex to create the entry is read from the symref here: https://github.com/eclipse/openj9/blob/e749042001fcb7e9b20f74d20e7f85a12538d476/runtime/compiler/env/VMJ9.cpp#L5405

The cpIndex is stored as an signed 18-bit bit field. 0x20000 in binary is 10 0000 0000 0000 0000. This means that 0x20000 is interpreted as -131072 and sign extended when cast to a I_32.

The cp and cpIndex are passed to J9::CodeCache::reserveUnresolvedTrampoline to create the entry in _unresolvedMethodHT: https://github.com/eclipse/openj9/blob/e749042001fcb7e9b20f74d20e7f85a12538d476/runtime/compiler/runtime/J9CodeCache.cpp#L670

Example values:

cp = 0x1814EC0
cpIndex = 0xFFFE0000      // decimal value of -131072
key = 0xfffffcfd62800000  // generated by running a hash function on the cp and cpIndex

The key is different. Due the original 18-bit cpIndex being sign extended when used to add the entry and being zero extended when used to look up the entry, the entry can not be found.

I am currently looking at making a fix where the sign on cpIndex is consistent for all access (add, delete, lookup) to the hashtables.

This is a common issue so it should be a problem on X and Z as well not just Power and Arm.

IBMJimmyk commented 3 years ago

After going through the codebase, I suspect the fix can be limited to just one location: https://github.com/eclipse/openj9/blob/e749042001fcb7e9b20f74d20e7f85a12538d476/runtime/compiler/env/VMJ9.cpp#L5405

This location was also mentioned in the previous comment. The line can be changed to:

I_32 cpIndex = symRef->getCPIndexForVM();

This will mask out the sign extension. From what I can tell, all other accesses to the hashtable assume an unsigned cpIndex. This change will bring this location in line with the others.

I have a test build with the proposed fix and I am currently grinding the test to verify.

zl-wang commented 3 years ago

nice finding ...

0xdaryl commented 3 years ago

Good work diagnosing this problem @IBMJimmyk !

IBMJimmyk commented 3 years ago

I've opened a PR that hopefully fixes the issue here: https://github.com/eclipse/openj9/pull/12415

The fix is pretty much the one I listed earlier. At first I thought there were some issues with how relocations are handled that could lead to a similar problem. But, it turns out that the problems with the relocations are a separate problem so they shouldn't hold up fixing this one.