Closed pshipton closed 3 years ago
@zl-wang : please assign for triage.
@mnalam-p please take a look
I'm setting this as a blocker. It seems like a regression, as we didn't see it before and now it's occurring often.
The crash is byproduct of trying to load an invalid address in the jitted code. The problematic jitted method is -
jdk/internal/util/random/RandomSupport$RandomLongsSpliterator.forEachRemaining
From the jitdump the problematic IL is the following -
lwz gr24, [gr3, 40] # SymRef jdk/internal/util/random/RandomSupport$RandomLongsSpliterator.generatingGenerator Ljava/util/random/RandomGenerator;[#385 final Shadow +40]
...
lwz gr10, [gr24, 8] # SymRef java/util/SplittableRandom$AbstractSplittableGeneratorProxy.this$0 Ljava/util/SplittableRandom;[#438 final Shadow +8]
ld gr2, [gr10, 8] # SymRef java/util/SplittableRandom.seed J[#449 Shadow +8] <-- CRASH
Looking at the RandomLongsSpliterator
class, it matches with the load offset (40) -
Fields for Class: {jdk/internal/util/random/RandomSupport$RandomLongsSpliterator}
Hidden Fields:
> lockWord ( +4) - <hidden>
Super class fields: {java/lang/Object} Number of Fields: 0
Super class fields: {jdk/internal/util/random/RandomSupport$RandomSpliterator} Number of Fields: 2
> index ( +8) (J) - 3c0001 Public
> fence ( +16) (J) - 3c0011 Public Final
Class fields: Number of Fields: 3
> generatingGenerator ( +40) (Ljava/util/random/RandomGenerator;) - 20010 Final
> origin ( +24) (J) - 3c0010 Final
> bound ( +32) (J) - 3c0010 Final
But field generatingGenerator
should have a type of RandomGenerator
. However looking at the type of object in register 24, we see -
$r24 = 0x00000000fff70120 Obj - {java/util/concurrent/ThreadLocalRandom$ThreadLocalRandomProxy}
Looking at the ThreadLocalRandomProxy implementation for jdk17, it shows the following hierarchy -
ThreadLocalRandomProxy
-- Random
--- RandomGenerator
But looking at the ThreadLocalRandomProxy
object on register24 -
Object is of type:
Class Path/Name: {java/util/concurrent/ThreadLocalRandom$ThreadLocalRandomProxy} J9Class 0x000000000084db00
Hidden Fields:
> lockWord ( +4) = 0x000000000025fe08 - <hidden>
Super class fields: {java/lang/Object} Number of Fields: 0
Super class fields: {java/util/Random} Number of Fields: 12
serialVersionUID = 0x363296344bf00a53 (J) - 7c0018 Static Final
> seed (+16) = 0x00000000fff71320 (Ljava/util/concurrent/atomic/AtomicLong;) - 20012 Private Final : Obj - {java/util/concurrent/atomic/AtomicLong}
multiplier = 0x00000005deece66d (J) - 7c001a Private Static Final
addend = 0x000000000000000b (J) - 7c001a Private Static Final
mask = 0x0000ffffffffffff (J) - 7c001a Private Static Final
DOUBLE_UNIT = 0x3ca0000000000000 (D) - 5c001a Private Static Final
seedUniquifier = 0x000000009fd65958 (Ljava/util/concurrent/atomic/AtomicLong;) - 2001a Private Static Final : Obj - {java/util/concurrent/atomic/AtomicLong}
> nextNextGaussian ( +8) = 0x0000000000000000 (D) - 1c0002 Private
> haveNextNextGaussian (+20) = 0x00000000 (Z) - 80002 Private
serialPersistentFields = 0x000000009fd65968 ([Ljava/io/ObjectStreamField;) - 2001a Private Static Final : Obj - {[Ljava/io/ObjectStreamField}
unsafe = 0x000000009fc4f950 (Ljdk/internal/misc/Unsafe;) - 2001a Private Static Final : Obj - {jdk/internal/misc/Unsafe}
seedOffset = 0x363296344bf00a53 (J) - 3c001a Private Static Final
Class fields: Number of Fields: 1
PROXY = 0x00000000fff70120 (Ljava/util/Random;) - 20018 Static Final : Obj - {java/util/concurrent/ThreadLocalRandom$ThreadLocalRandomProxy}
Which leads to wrong loading of nextNextGaussian
value (0x0) instead of loading the this
.
Looking at the SplittableRandom$AbstractSplittableGeneratorProxy
, we see -
Fields for Class: {java/util/SplittableRandom$AbstractSplittableGeneratorProxy}
Hidden Fields:
> lockWord ( +4) - <hidden>
Super class fields: {java/lang/Object} Number of Fields: 0
Super class fields: {jdk/internal/util/random/RandomSupport$AbstractSpliteratorGenerator} Number of Fields: 0
Super class fields: {jdk/internal/util/random/RandomSupport$AbstractSplittableGenerator} Number of Fields: 0
Class fields: Number of Fields: 1
> this$0 ( +8) (Ljava/util/SplittableRandom;) - 21010 Final
This matches with the offset that jitted code is trying to load. Looking at the AbstractSplittableGenerator it has more direct hierarchy -
AbstractSpliteratorGenerator
-- RandomGenerator
So, tl;dr - basically the crash is happening due to having wrong object passed to the jitted code. This is what I have figured out so far. I will continue to narrow down on this.
@mnalam-p the obvious question needing answer is whether type java/util/concurrent/ThreadLocalRandom$ThreadLocalRandomProxy
is a sub-type of java/util/SplittableRandom$AbstractSplittableGeneratorProxy
.
my guess to the answer is no (since i didn't see this$0
field in the class verbose list), then the problem turned into ILgen or inliner (field this$0
getter inlined) error: you cannot get this$0
field at offset 8 of former-type object. fundamentally, the former-type is accessed as if it were the latter-type.
If the answer is yes, why is this$0
field not shown above?
@pshipton @tajila
@0xdaryl looks more likely a common issue. ^^^
@hzongaro : please investigate from an optimizer perspective
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64le_linux_Nightly/31/
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64le_linux_Nightly/34
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64le_linux_Nightly/36
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64le_linux_Nightly/39
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64le_linux_Release/7
@mnalam-p and I investigated further. It looks like this might be an issue with Field Privatization. This is a snippet of IL just before Field Privatization. It looks that optimization moves the load at n486n before the inlineProfiledGuard
at n1170n.
n1163n BBStart <block_102> (freq 10000) [ 0x3fff44845b40] bci=[1,0,411] rc=0 vc=951 vn=- li=- udi=- nc=0
n1166n astore <temp slot 13>[#463 Auto] [flags 0x7 0x0 ] (X!=0 sharedMemory ) [ 0x3fff44845c30] bci=[-1,47,1060] rc=0 vc=951 vn=- li=- udi=6 nc=1 flg=0x4
n1165n aload consumer<parm 1 Ljava/util/function/LongConsumer;>[#377 Parm] [flags 0x40000107 0x0 ] (X!=0 sharedMemory ) [ 0x3fff44845be0] bci=[-1,47,1060] rc=1 vc=951 vn=- li=- udi=74 nc=0 flg=0x4
n1170n ifacmpne --> block_18 BBStart at n668n (inlineProfiledGuard ) [ 0x3fff44845d70] bci=[2,0,232] rc=0 vc=951 vn=- li=- udi=- nc=2 flg=0x1020
n1168n aloadi <vft-symbol>[#301 Shadow] [flags 0x18607 0x0 ] (X!=0 sharedMemory ) [ 0x3fff44845cd0] bci=[1,0,411] rc=1 vc=951 vn=- li=- udi=- nc=1 flg=0x4
n1169n aload r<auto slot 6>[#386 Auto] [flags 0x7 0x0 ] (X>=0 sharedMemory ) [ 0x3fff44845d20] bci=[1,0,411] rc=1 vc=951 vn=- li=- udi=75 nc=0 flg=0x100
n1171n aconst 0x7f5900 (java/util/SplittableRandom$AbstractSplittableGeneratorProxy.class) (classPointerConstant X!=0 sharedMemory ) [ 0x3fff44845dc0] bci=[2,0,232] rc=1 vc=951 vn=- li=- udi=- nc=0 flg=0x10004
n1172n BBEnd </block_102> ===== [ 0x3fff44845e10] bci=[2,0,232] rc=0 vc=951 vn=- li=- udi=- nc=0
n673n BBStart <block_17> (freq 10000) [ 0x3fff4472c210] bci=[1,3,411] rc=0 vc=951 vn=- li=- udi=- nc=0
n488n compressedRefs [ 0x3fff44728840] bci=[2,1,232] rc=0 vc=951 vn=- li=- udi=- nc=2
n486n aloadi java/util/SplittableRandom$AbstractSplittableGeneratorProxy.this$0 Ljava/util/SplittableRandom;[#438 final Shadow +8] [flags 0x20607 0x0 ] (X>=0 sharedMemory ) [ 0x3fff447287a0] bci=[2,1,232] rc=4 vc=951 vn=- li=- udi=- nc=1 flg=0x100
n485n aload r<auto slot 6>[#386 Auto] [flags 0x7 0x0 ] (X!=0 X>=0 sharedMemory ) [ 0x3fff44728750] bci=[2,0,232] rc=1 vc=951 vn=- li=- udi=76 nc=0 flg=0x104
Observed at JDK17 head build job/Test_openjdk17_j9_sanity.openjdk_ppc64le_linux/64/consoleFull
(pll025
)
01:06:22 openjdk version "17-beta" 2021-09-14
01:06:22 IBM Semeru Runtime Open Edition 17+35-202110070406 (build 17-beta+35-202110070406)
01:06:22 Eclipse OpenJ9 VM 17+35-202110070406 (build master-b0065c9ee, JRE 17 Linux ppc64le-64-Bit Compressed References 20211007_9 (JIT enabled, AOT enabled)
01:06:22 OpenJ9 - b0065c9ee
01:06:22 OMR - f6ab3c6fc
01:06:22 JCL - 480d595b54 based on jdk-17+35)
02:36:26 ===============================================
02:36:26 Running test jdk_util_1 ...
02:36:26 ===============================================
02:36:26 jdk_util_1 Start Time: Thu Oct 7 02:36:25 2021 Epoch Time (ms): 1633588585712
02:36:26 "/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_ppc64le_linux/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:destroyAll; "/home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_ppc64le_linux/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done";
02:36:26 JVMSHRC005I No shared class caches available
02:36:26 JVMSHRC005I No shared class caches available
02:36:26 cache cleanup done
02:36:26 variation: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache Mode650
02:36:26 JVM_OPTIONS: -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-UseCompressedOops
02:39:23 --------------------------------------------------
02:39:23 TEST: java/util/concurrent/tck/JSR166TestCase.java
02:39:23 Unhandled exception
02:39:23 Type=Segmentation error vmState=0x00000000
02:39:23 Unhandled exception
02:39:23 Type=Segmentation error vmState=0x00000000
02:39:23 J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
02:39:23 Handler1=00007BB4DFA251D0 Handler2=00007BB4DF95C300
02:39:23 R0=00000000009E2800 R1=00007BB4C25EB710 R2=00007BB4DF25F300 R3=0000000000951828
02:39:23 R4=00000000FFD903A0 R5=00000000FFD903A0 R6=00000000009E2800 R7=00000000009E2800
02:39:23 R8=00000000003E1A00 R9=00000000FFD90380 R10=0000000000000000 R11=0000000000951828
02:39:23 R12=00007BB4DEF6A580 R13=00007BB4C25F68F0 R14=0000000000955A50 R15=0000000000708600
02:39:23 R16=00007BB4C0CC0038 R17=000000009FD4E8E0 R18=00000000FFD90348 R19=0000000000000000
02:39:23 R20=00000000FFD903A0 R21=0000000000000041 R22=0000000000000002 R23=0000000000000001
02:39:23 R24=00000000A0375A60 R25=00000399BF091468 R26=0000000000A1EB00 R27=00000399BF09146A
02:39:23 R28=0000000000000040 R29=00000000FFD90E60 R30=00000000009DFB00 R31=00000000FFD90E60
02:39:23 NIP=00007BB4C3AF48DC MSR=800000000280F033 ORIG_GPR3=00007BB4DE96FC38 CTR=00007BB4C3AF4800
02:39:23 LINK=00007BB4C3AF48CC XER=0000000000000000 CCR=0000000042004244 SOFTE=0000000000000001
02:39:23 TRAP=0000000000000300 DAR=0000000000000008 dsisr=0000000040000000 RESULT=0000000000000000
02:39:23 J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
02:39:23 Handler1=00007BB4DFA251D0 Handler2=00007BB4DF95C300
02:39:23 R0=00000000009E2800 R1=00007BB3CB42A070 R2=0000000000000001 R3=000000000034D5E8
02:39:23 R4=00000000FFD903A0 R5=00000000FFD903A0 R6=00000000009E2800 R7=00000000009E2800
02:39:23 R8=800000000280F033 R9=00000000FFD90380 R10=0000000000000000 R11=000000000034D5E8
02:39:23 R12=00007BB4DEF6A580 R13=00007BB3CB4368F0 R14=00000000003508B0 R15=00000000002FAA00
02:39:23 R16=00007BB4C0CC0038 R17=FFFFFFFFFFFFFFFF R18=00000000FFD90348 R19=0000000000000000
02:39:23 R20=00000000FFD903A0 R21=0000000000000042 R22=0000000000000002 R23=0000000000000001
02:39:23 R24=00000000A0375A60 R25=00000399BF091468 R26=0000000000A1EB00 R27=00000399BF09146A
02:39:23 R28=0000000000000041 R29=00000000FFD90E00 R30=00000000009DFB00 R31=00000000FFD90E00
02:39:23 NIP=00007BB4C3AF48DC MSR=800000000280F033 ORIG_GPR3=00007BB4C3AE0D08 CTR=00007BB4C3AE0C80
02:39:23 LINK=00007BB4C3AF48CC XER=0000000020040000 CCR=0000000042004484 SOFTE=0000000000000001
02:39:23 TRAP=0000000000000300 DAR=0000000000000008 dsisr=0000000040000000 RESULT=0000000000000000
02:39:23 FPR0 0000000000000005 (f: 5.000000, d: 2.470328e-323)
02:39:23 FPR1 4053cced60000000 (f: 1610612736.000000, d: 7.920199e+01)
02:39:23 FPR2 44112478468dcbe2 (f: 1183697920.000000, d: 7.905564e+19)
02:39:23 FPR3 3fee666660000000 (f: 1610612736.000000, d: 9.500000e-01)
02:39:23 FPR4 4017225fa0000000 (f: 2684354560.000000, d: 5.783568e+00)
02:39:23 FPR5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR6 3fe62e42fefa39ef (f: 4277811712.000000, d: 6.931472e-01)
02:39:23 FPR7 bfba4e76ce8c0e5e (f: 3465285120.000000, d: -1.027598e-01)
02:39:23 FPR8 bff0000000000000 (f: 0.000000, d: -1.000000e+00)
02:39:23 FPR9 bfdffffef20a4123 (f: 4060758272.000000, d: -4.999997e-01)
02:39:23 FPR10 bfe0238b21457068 (f: 558198912.000000, d: -5.043388e-01)
02:39:23 FPR11 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
02:39:23 FPR12 40faaf2000000000 (f: 0.000000, d: 1.092980e+05)
02:39:23 FPR13 3f9e5e4bb4666720 (f: 3026609920.000000, d: 2.965658e-02)
02:39:23 FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR17 4411221781182a0a (f: 2165844480.000000, d: 7.901281e+19)
02:39:23 FPR18 43638d4fa9e1585e (f: 2850117632.000000, d: 4.402718e+16)
02:39:23 FPR19 437349bdab780c00 (f: 2876771328.000000, d: 8.686566e+16)
02:39:23 FPR20 43638d4fa9e1585e (f: 2850117632.000000, d: 4.402718e+16)
02:39:23 FPR21 437349bdab780c00 (f: 2876771328.000000, d: 8.686566e+16)
02:39:23 FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR0 0000000000000005 (f: 5.000000, d: 2.470328e-323)
02:39:23 FPR1 4053dbfee0000000 (f: 3758096384.000000, d: 7.943743e+01)
02:39:23 FPR2 4020000000000000 (f: 0.000000, d: 8.000000e+00)
02:39:23 FPR3 4049000000000000 (f: 0.000000, d: 5.000000e+01)
02:39:23 FPR4 3fe8000000000000 (f: 0.000000, d: 7.500000e-01)
02:39:23 FPR5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR6 3fe62e42fefa39ef (f: 4277811712.000000, d: 6.931472e-01)
02:39:23 FPR7 bfb1973c5a611ccc (f: 1516313856.000000, d: -6.871392e-02)
02:39:23 FPR8 bff0000000000000 (f: 0.000000, d: -1.000000e+00)
02:39:23 FPR9 bfdffffef20a4123 (f: 4060758272.000000, d: -4.999997e-01)
02:39:23 FPR10 bfe00b7e938184f4 (f: 2474738944.000000, d: -5.014031e-01)
02:39:23 FPR11 3ff0000000000000 (f: 0.000000, d: 1.000000e+00)
02:39:23 FPR12 40faaf2000000000 (f: 0.000000, d: 1.092980e+05)
02:39:23 FPR13 3fcc72342604f188 (f: 637858176.000000, d: 2.222352e-01)
02:39:23 FPR14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR16 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR17 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR18 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR19 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR20 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR21 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR22 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR23 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR24 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR25 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR26 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR27 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR28 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR29 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR30 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23 FPR31 0000000000000000 (f: 0.000000, d: 0.000000e+00)
02:39:23
02:39:23 Compiled_method=jdk/internal/util/random/RandomSupport$RandomLongsSpliterator.forEachRemaining(Ljava/util/function/LongConsumer;)V
02:39:23 Target=2_90_20211007_9 (Linux 5.4.0-74-generic)
02:39:23 CPU=ppc64le (64 logical CPUs) (0xff01e0000 RAM)
02:39:23 ----------- Stack Backtrace -----------
02:39:23 (0x00007BB4C3AF48DC [<unknown>+0x0])
02:39:23 (0x00007BB4DF9F94B0 [libj9vm29.so+0x194b0])
02:39:23 (0x00007BB4DFA94D70 [libj9vm29.so+0xb4d70])
02:39:23 (0x00007BB4DF95D708 [libj9prt29.so+0x3d708])
02:39:23 (0x00007BB4DFA8FF24 [libj9vm29.so+0xaff24])
02:39:23 (0x00007BB4E40413E8 [libj9thr29.so+0x113e8])
02:39:23 (0x00007BB4E4C09818 [libpthread.so.0+0x9818])
02:39:23 clone+0x74 (0x00007BB4E4ADEB24 [libc.so.6+0x14eb24])
02:39:23 ---------------------------------------
02:39:23 JVMDUMP039I Processing dump event "gpf", detail "" at 2021/10/07 02:39:01 - please wait.
02:39:23 TEST RESULT: Failed. Unexpected exit from test [exit code: 255]
02:39:23 --------------------------------------------------
02:52:26 Test results: passed: 885; failed: 1
02:52:26 Report written to /home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_ppc64le_linux/jvmtest/openjdk/report/html/report.html
02:52:26 Results written to /home/jenkins/workspace/Test_openjdk17_j9_sanity.openjdk_ppc64le_linux/aqa-tests/TKG/output_16335833081670/jdk_util_1/work
02:52:26 Error: Some tests failed or other problems occurred.
02:52:26
02:52:26 jdk_util_1_FAILED
@hzongaro : could you create a 0.29.0 PR in the openj9-omr repo for the fix please?
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_ppc64le_linux_Nightly/25 jdk_util_0 -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:+UseCompressedOops java/util/concurrent/tck/JSR166TestCase.java
https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.openjdk_ppc64le_linux_Nightly/25/openjdk_test_output.tar.gz