Open r30shah opened 3 years ago
FYI @zl-wang
Following on from Rahil's comment on eclipse/omr#6144:
May be you can confirm this, do we rely on cpIndex to be -1 to know it is fabricated loadaddr?
Yes, a CP index of -1 is used to determine whether to unset useUnresSnippetToAvoidRelo
, and also whether to replace TR_ClassAddress
with TR_ArbitraryClassAddress
Yes, a CP index of -1 is used to determine whether to unset useUnresSnippetToAvoidRelo, and also whether to replace TR_ClassAddress with TR_ArbitraryClassAddress
Hmm, even-though a fabricated loadaddr
was created with symRef with cpIndex -1
in the transformation, cpIndex is set to different when we reach code-gen. I have not narrowed down where that happens exactly yet as it was handled well on Z and X.
With the help of @vijaysun-omr and @jdmpapin , I did more investigations on what is happening.
When VP transformation added in this PR transforms the VFT load to loadaddr
node, it creates a ClassSymbol with cpIndex -1
[1]. Later in Loop versioner optimization, it generates the instanceOf test for the versioned loop which re-uses the same class symbol, but gets the cpIndex [2] and updates it, so when it reaches the Power codegen, it reaches with the fabricated loadaddr
which does not have cpIndex -1 and encounters [3] which choses to use unresolved data snippet on Power for loadaddr
.
Digging a bit deeper, pasting some of the findings from the core-dump where originally we failed with segmentation while accessing a field from J9Class pointer.
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007B6EF678E090 Handler2=00007B6EF64F843C
R0=00007B6EDDA59FE8 R1=00007B6EF73FD680 R2=00007B6EF6B26E00 R3=0000000000017800
R4=00000000FFF69328 R5=0000000000000001 R6=0000000000000015 R7=0000000000000014
R8=0000000000000020 R9=000000000000000E R10=00007B6EDDA59FE8 R11=00007B6EF52CB120
R12=00007B6EF4F1E40C R13=00007B6EF74068E0 R14=0000000000153150 R15=0000000000095700
R16=00007B6EBFE10038 R17=FFFFFFFFFFFFFFFF R18=00007B6EF67D18C8 R19=00007B6EF6B26E00
R20=00007B6EA5461BAE R21=00007B6EF6AEA0AA R22=0008000300080003 R23=00007B6EF73FD940
R24=00007B6EF73FE3E0 R25=00000000FFF69408 R26=0000000000000001 R27=00000000FFF69328
R28=0000000000000014 R29=0000000000000001 R30=0000000000000015 R31=00000000FFF69408
NIP=00007B6EDDA5EF78 MSR=800000010280F033 ORIG_GPR3=00000000000087D0 CTR=00007B6EDDA5F2D4
LINK=00007B6EF52CAF78 XER=0000000000000000 CCR=0000000028004888 SOFTE=0000000000000001
TRAP=0000000000000300 DAR=0000000000017818 dsisr=0000000040000000 RESULT=0000000000000000
...
Looking at the instructions around NIP
0x7b6edda5ef60 {java/lang/String.<init>} +34 0:0 51030048 bl 0x7b6edda5f2b0 U>> +246 Snippet-> Trampoline {libj9jit29.so}{endproc._interpreterUnresolvedInstanceDataStoreGlue} +0 // genericReturn
0x7b6edda5ef64 {java/lang/String.<init>} +35 00006360 ori r3, r3, 0 <<< ^+256
0x7b6edda5ef68 {java/lang/String.<init>} +36 c6076378 sldi r3, r3, 0x20
0x7b6edda5ef6c {java/lang/String.<init>} +37 01006364 oris r3, r3, 1
0x7b6edda5ef70 {java/lang/String.<init>} +38 00786360 ori r3, r3, 0x7800
0x7b6edda5ef74 {java/lang/String.<init>} +39 00006338 addi r3, r3, 0
0x7b6edda5ef78 {java/lang/String.<init>} +40 180063e8 ld r3, 0x18(r3) // Dereferencing J9Class pointer
Looking into the content of r3
which is 0x000000000017800
. Here is the snippet from the instruction selection where this instruction was generated.
------------------------------
n173n ( 0) ificmpne --> block_18 BBStart at n177n () [ 0x7b6edc9325d0] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=- nc=3 flg=0x20
n207n ( 0) iand (in GPR_0048) (X>=0 cannotOverflow ) [ 0x7b6edc933070] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=12176 nc=2 flg=0x1100
n206n ( 0) l2i (in GPR_0049) [ 0x7b6edc933020] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=12288 nc=1
n169n ( 0) lloadi <isClassAndDepthFlags> methodSymbolIndex 0[#298 Shadow +24] [flags 0x603 0x0 ] (in GPR_0049) (cannotOverflow ) [ 0x7b6edc932490] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=12288 nc=1 flg=0x1000
n168n ( 0) loadaddr java/lang/String cpIndex = 86[#393 Static] [flags 0x18307 0x0 ] (in GPR_0052) (X>=0 sharedMemory ) [ 0x7b6edc932440] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=13968 nc=0 flg=0x100
n204n ( 0) iconst 0x40200000 (X!=0 X>=0 ) [ 0x7b6edc932f80] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=- nc=0 flg=0x104
n193n ( 0) ==>iconst 0 (X==0 X>=0 X<=0 )
n659n ( 0) GlRegDeps () [ 0x7b6edc93bdb0] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=- nc=5 flg=0x20
n652n ( 1) ==>iRegLoad (in GPR_0032)
n653n ( 1) ==>iRegLoad (in GPR_0033)
n654n ( 1) ==>aRegLoad (in &GPR_0034)
n655n ( 1) ==>iRegLoad (in GPR_0035)
n656n ( 1) ==>aRegLoad (in &GPR_0036)
------------------------------
[ 0x7b6edcb33590] 0 ld GPR_0051, [gr16, 0] ; Backpatched branch to Unresolved Data Snippet Label L0065 # SymRef java/lang/String cpIndex = -1[#396 Static] [flags 0x18307 0x0 ]
[ 0x7b6edcb33700] 0 addi GPR_0052, [GPR_0051, 0] # SymRef java/lang/String cpIndex = 86[#393 Static] [flags 0x18307 0x0 ]
[ 0x7b6edcb337a0] 0 ld GPR_0049, [GPR_0052, 24] # SymRef <isClassAndDepthFlags> methodSymbolIndex 0[#298 Shadow +24] [flags 0x603 0x0 ]
[ 0x7b6edcb338b0] 0 andis. GPR_0048, GPR_0049, 16416
PRE:
POST: [CCR_0053 : cr0]
[ 0x7b6edcb33a80] 0 cmpwi CCR_0054, GPR_0048, 0
[ 0x7b6edcb33c90] 0 bne CCR_0054, Label L0018
...
Snippet :
00007B6EDDA84FF0 000003d8 Snippet Label L0065: ; Unresolved Data Snippet
0x7b6edda84ff0 000003d8 4812a6f1 bl 00007B6EDDBAF6E0 ; Through Trampoline
0x7b6edda84ff4 000003dc dda84ca0 00007b6e .long 00007B6EDDA84CA0 ; Code cache return address
0x7b6edda84ffc 000003e4 80000056 .long 0x80000056 ; pTOC|VD|SY|IX|cpIndex of the data reference
0x7b6edda85000 000003e8 a5471528 00007b6e .long 00007B6EA5471528 ; Constant pool address
0x7b6edda85008 000003f0 00000000 .long 0x00000000 ; Offset to be merged
0x7b6edda8500c 000003f4 3c600000 .long 0x3c600000 ; Instruction template
0x7b6edda85010 000003f8 00000000 .long 0x00000000 ; Lock word initialized to 0
0x7b6edda85014 000003fc deadbeef .long 0xdeadbeef ; <clinit> case - 1st instruction (0xdeadbeef)
0x7b6edda85018 00000400 4bfffc8c b 00007B6EDDA84CA4 ; <clinit> case - Branch back to main line code
Looking into the instructions emitted for the loadaddr
for instanceOf node generated by loop versioner,
------------------------------
n796n ( 0) treetop [ 0x7b6edc93e880] bci=[-1,165,-] rc=0 vc=988 vn=- li=- udi=- nc=1
n423n ( 1) loadaddr java/lang/String cpIndex = 86[#393 Static] [flags 0x18307 0x0 ] (in GPR_0178) [ 0x7b6edc9373f0] bci=[-1,165,-] rc=1 vc=988 vn=- li=40 udi=15744 nc=0
------------------------------
[ 0x7b6edcb43c80] 165 ld GPR_0177, [gr16, 0] ; Backpatched branch to Unresolved Data Snippet Label L0177 # SymRef java/lang/String cpIndex = -1[#413 Static] [flags 0x18307 0x0 ]
[ 0x7b6edcb43df0] 165 addi GPR_0178, [GPR_0177, 0] # SymRef java/lang/String cpIndex = 86[#393 Static] [flags 0x18307 0x0 ]
============================================================
...
Snippet
00007B6EDDA8501C 00000404 Snippet Label L0177: ; Unresolved Data Snippet
0x7b6edda8501c 00000404 4812a6c5 bl 00007B6EDDBAF6E0 ; Through Trampoline
0x7b6edda85020 00000408 dda84dc4 00007b6e .long 00007B6EDDA84DC4 ; Code cache return address
0x7b6edda85028 00000410 80000056 .long 0x80000056 ; pTOC|VD|SY|IX|cpIndex of the data reference
0x7b6edda8502c 00000414 a5471528 00007b6e .long 00007B6EA5471528 ; Constant pool address
0x7b6edda85034 0000041c 00000000 .long 0x00000000 ; Offset to be merged
0x7b6edda85038 00000420 3c400000 .long 0x3c400000 ; Instruction template
0x7b6edda8503c 00000424 00000000 .long 0x00000000 ; Lock word initialized to 0
0x7b6edda85040 00000428 deadbeef .long 0xdeadbeef ; <clinit> case - 1st instruction (0xdeadbeef)
0x7b6edda85044 0000042c 4bfffd84 b 00007B6EDDA84DC8 ; <clinit> case - Branch back to main line code
Apart from code cache return address and the instruction template, snippet seemed to be identical.
As it fails earlier while de-referencing the J9Class, it does not reach this instanceOf so from the same failing run it was impossible to tell what value was patched at second loadaddr
site, so to know if we patch second site with correct address or not, I disabled my change and added a NULLCHK
of constant NULL after we pass the instanceOf
to see the value being patched for loadaddr java/lang/String
node with cpIndex that is not -1
under the instanceOf test.
Snippet from instruction selection that emits instruction for loadaddr
0x7630b075b220 +98 || 00004260 ori r2, r2, 0 <<< ^+246
0x7630b075b224 +99 || c6074278 sldi r2, r2, 0x20
0x7630b075b228 +100 || 0b004264 oris r2, r2, 0xb
0x7630b075b22c +101 || 009b4260 ori r2, r2, 0x9b00 // Instruction sequence that loads J9Class pointer for java/lang/String into R2
0x7630b075b230 +102 || 00008238 addi r4, r2, 0
0x7630b075b234 +103 || 00003928 cmpldi r25, 0
0x7630b075b238 +104 ||* c4008241 beq 0x7630b075b2fc C>> +153
0x7630b075b23c +105 ||| 00005980 lwz r2, 0(r25)
0x7630b075b240 +106 ||| 2e004254 rlwinm r2, r2, 0, 0, 0x17
0x7630b075b244 +107 ||| 4020227c cmpld r2, r4 // Class Equality test comparing the J9Class pointer of the class of `this` object and J9Class pointer of java/lang/String class.
0x7630b075b248 +108 |||* b4008240 bne 0x7630b075b2fc C>> +153
0x7630b075b24c +109 |||| 00000038 li r0, 0
0x7630b075b250 +110 -1:0 |||| 00008008 tdeqi r0, 0 // aload0 // Throws NPE, so if class test succeeds previously, it will throw NPE
Looking at the core-dump generated for the thrown NPE, value in R2 that corresponds to the loadaddr
is 0x00000000000B9B00
which is a valid J9Class pointer for java/lang/String class.
@zl-wang I am unable to understand how, it can not relocate the fabricated loadaddr
in the case of VP transformed vft load, but can do so for the similar loadaddr
generated by Loop Versioner under instanceof test using Unresolved data snippet and need expert from Power codegen team to debug this further and provide the correct fix. I have core-dump and method files from both experiments which I can share over chat.
This failure still blocks OMR acceptance, so a correct understanding of why it is failing and if the fix is simple or complicated would help us deciding do we need to merge https://github.com/eclipse/omr/pull/6144 to allow OMR acceptance builds to pass or revert back original VP transformation PR.
FYI @fjeremic , @rmnattas
[1]. https://github.com/eclipse/omr/pull/6137 [2]. https://github.com/eclipse/omr/blob/9ebd8f0834d21b40a6ddc0df88f80fbe8fe57436/compiler/optimizer/LoopVersioner.cpp#L7855-L7862 [3]. https://github.com/eclipse/omr/blob/9ebd8f0834d21b40a6ddc0df88f80fbe8fe57436/compiler/compile/OMRSymbolReferenceTable.cpp#L1376
@zl-wang we need your opinion here because the failing job is an OpenJ9 acceptance test.
could you verify what the class value was (in the crashing sequence) from original SCC AOT code (i.e. not resolved and patched yet)? could you show me what was written over the bl instruction (it should be an addis/lis i figured)? at this point, i have no reason to believe the crashing sequence was not patched, but wrong value being patched in. i.e. the resolve itself returned a wrong value. maybe out of context is the reason (since that class is not expected there at all).
if you can set a break-point at the resolve-return, we can prove it further. hopefully, you can limit it down to a small set of AOT-JIT methods to debug reasonably.
we found the reason it failed. the bad one has incorrect CP in snippet (whether it was not relocated at all, or relocated incorrectly ... to be determined).
@vijaysun-omr for the time being, disabling Rahil's new work on p for non SVM AOT to allow us move forward. we will figure out a fix, then enable it.
looks like the said CP was relocated incorrectly. allow @r30shah to add the details here.
Thanks a lot @zl-wang for the quick look at the reason why it failed. For the reference I am printing snippet from both log file and AOT relocated code from the failing run.
00007B6EDDA84FF0 000003d8 Snippet Label L0065: ; Unresolved Data Snippet
0x7b6edda84ff0 000003d8 4812a6f1 bl 00007B6EDDBAF6E0 ; Through Trampoline
0x7b6edda84ff4 000003dc dda84ca0 00007b6e .long 00007B6EDDA84CA0 ; Code cache return address
0x7b6edda84ffc 000003e4 80000056 .long 0x80000056 ; pTOC|VD|SY|IX|cpIndex of the data reference
0x7b6edda85000 000003e8 a5471528 00007b6e .long 00007B6EA5471528 ; Constant pool address
0x7b6edda85008 000003f0 00000000 .long 0x00000000 ; Offset to be merged
0x7b6edda8500c 000003f4 3c600000 .long 0x3c600000 ; Instruction template
0x7b6edda85010 000003f8 00000000 .long 0x00000000 ; Lock word initialized to 0
0x7b6edda85014 000003fc deadbeef .long 0xdeadbeef ; <clinit> case - 1st instruction (0xdeadbeef)
0x7b6edda85018 00000400 4bfffc8c b 00007B6EDDA84CA4 ; <clinit> case - Branch back to main line code
00007B6EDDA8501C 00000404 Snippet Label L0177: ; Unresolved Data Snippet
0x7b6edda8501c 00000404 4812a6c5 bl 00007B6EDDBAF6E0 ; Through Trampoline
0x7b6edda85020 00000408 dda84dc4 00007b6e .long 00007B6EDDA84DC4 ; Code cache return address
0x7b6edda85028 00000410 80000056 .long 0x80000056 ; pTOC|VD|SY|IX|cpIndex of the data reference
0x7b6edda8502c 00000414 a5471528 00007b6e .long 00007B6EA5471528 ; Constant pool address
0x7b6edda85034 0000041c 00000000 .long 0x00000000 ; Offset to be merged
0x7b6edda85038 00000420 3c400000 .long 0x3c400000 ; Instruction template
0x7b6edda8503c 00000424 00000000 .long 0x00000000 ; Lock word initialized to 0
0x7b6edda85040 00000428 deadbeef .long 0xdeadbeef ; <clinit> case - 1st instruction (0xdeadbeef)
0x7b6edda85044 0000042c 4bfffd84 b 00007B6EDDA84DC8 ; <clinit> case - Branch back to main line code
In above two snippets, L0065
is generated for the loadaddr
generated by VP transformation while second one L0177
was generated for loadaddr
under instanceOf node.
Here is how relocated snippets looks from the core-dump.
Relocated Snippet for the loadaddr where patched class is incorrect
0x7b6edda5f2b0 {java/lang/String.<init>} +246 31041548 bl 0x7b6eddbaf6e0 Trampoline {libj9jit29.so}{endproc._interpreterUnresolvedInstanceDataStoreGlue} +0 <<< +34
0x7b6edda5f2b4 {java/lang/String.<init>} +247 60efa5dd stfdu f13, -0x10a0(r5) -> {java/lang/String.<init>} +34
0x7b6edda5f2b8 {java/lang/String.<init>} +248 6e7b0000 invalid instruction
0x7b6edda5f2bc {java/lang/String.<init>} +249 56000080 lwz r0, 0x56(0)
0x7b6edda5f2c0 {java/lang/String.<init>} +250 60d80a00 invalid instruction -> CP - {java/lang/Object}
0x7b6edda5f2c4 {java/lang/String.<init>} +251 00000000 invalid instruction
0x7b6edda5f2c8 {java/lang/String.<init>} +252 00000000 invalid instruction
0x7b6edda5f2cc {java/lang/String.<init>} +253 0000603c lis r3, 0
0x7b6edda5f2d0 {java/lang/String.<init>} +254 00000000 invalid instruction
0x7b6edda5f2d4 {java/lang/String.<init>} +255 0000603c lis r3, 0
0x7b6edda5f2d8 {java/lang/String.<init>} +256 8cfcff4b b 0x7b6edda5ef64 U>> ^+35
In above case, we relocated CPAddress of java/lang/Object
instead of java/lang/String
Relocated Snippet for the loadaddr under instanceof node
0x7b6edda5f2dc {java/lang/String.<init>} +257 05041548 bl 0x7b6eddbaf6e0 Trampoline {libj9jit29.so}{endproc._interpreterUnresolvedInstanceDataStoreGlue} +0 <<< +107
0x7b6edda5f2e0 {java/lang/String.<init>} +258 84f0a5dd stfdu f13, -0xf7c(r5) -> {java/lang/String.<init>} +107
0x7b6edda5f2e4 {java/lang/String.<init>} +259 6e7b0000 invalid instruction
0x7b6edda5f2e8 {java/lang/String.<init>} +260 56000080 lwz r0, 0x56(0)
0x7b6edda5f2ec {java/lang/String.<init>} +261 f0750b00 invalid instruction -> CP - {java/lang/String}
0x7b6edda5f2f0 {java/lang/String.<init>} +262 00000000 invalid instruction
0x7b6edda5f2f4 {java/lang/String.<init>} +263 00000000 invalid instruction
0x7b6edda5f2f8 {java/lang/String.<init>} +264 0000403c lis r2, 0
0x7b6edda5f2fc {java/lang/String.<init>} +265 00000000 invalid instruction
0x7b6edda5f300 {java/lang/String.<init>} +266 efbeadde stfdu f21, -0x4111(r13)
0x7b6edda5f304 {java/lang/String.<init>} +267 84fdff4b b 0x7b6edda5f088 U>> ^+108
So further debugging this with @rmnattas , We found out why the relocated constant pool address was incorrect for the fabricated loadaddr
generated by VP transformation.
As pasted in comment: https://github.com/eclipse-openj9/openj9/issues/13313#issuecomment-900500507
Constant Pool Address is incorrect in the bad case (First snippet). In the relocated AOT code, this is the Updated Constant Pool address for the VPTransformed resolved loadaddr
0x7b6edda5f2c0 {java/lang/String.<init>} +250 60d80a00 invalid instruction -> CP - {java/lang/Object}
The loadaddr
node generated by VP transformation, corresponds to calleeIndex 0
, and for the unresolvedDataStore snippet constructed for the same, when it creates the relocation record [1], it sets the inlined site index to 0
as what it gets it from the bci of the node. Probably when relocations are applied at that location [2], it gets the constant pool using the ramMethod at that inlinedSite Index which in the failing case is of java/lang/Object.<init>
and gets the incorrect ConstantPool Address.
[1]. https://github.com/eclipse-openj9/openj9/blob/df51c203a8b47e8f81df4d0f29c5e1e272771f43/runtime/compiler/p/codegen/J9UnresolvedDataSnippet.cpp#L186-L189 [2]. https://github.com/eclipse-openj9/openj9/blob/df51c203a8b47e8f81df4d0f29c5e1e272771f43/runtime/compiler/runtime/RelocationRecord.cpp#L1498-L1529
I haven't been following this issue too closely to say for sure, but AArch64 uses a similar relocation for the cpIndex in the unresolved data snippet. So if there is a problem with this code on Power, there may be an issue to fix on AArch64 as well. @knn-k @Akira1Saitoh
If this applies to AArch64, an AArch64 workaround as in eclipse/omr#6144 is probably in order too. Perhaps someone who is following this problem more closely can confirm.
@0xdaryl I am looking into the AArch64
codegen[1], and it seems like they will use unresolvedDataSnippet
only in the case symRef is actually unresolved. So for loadaddr
of resolved J9Class pointer, it seems to be adding external relocation record [2]. In case of P, we use the unresolved data snippet for resolved case.
[1]. https://github.com/eclipse/omr/blob/74204a7dfc876643c6fabffc6ccd51b7d3903d5d/compiler/aarch64/codegen/OMRMemoryReference.cpp#L88-L96
[2]. https://github.com/eclipse/omr/blob/74204a7dfc876643c6fabffc6ccd51b7d3903d5d/compiler/aarch64/codegen/OMRTreeEvaluator.cpp#L2446-L2477
Thanks for the investigation @r30shah . I didn't realize the point about the Power unresolved data snippet being used for resolved references, leading to the problem you're investigating.
A few thoughts that came up when discussing this with @r30shah:
We could force the node's bci to indicate an inlined site index that matches the owning method of the symref, but this node isn't an exception point, so I don't think its bci should be relevant. There is also no obvious corresponding bytecode index - there might not even be any bytecode instruction in the owning method that refers (directly) to the constant pool entry (since the constant pool is shared with other methods in that class). So IMO this should be fixed if possible by getting the inlined site index for the relocation from the symref's owning method, rather than from the node
I also got worried about the possibility of these unresolved snippets failing to resolve, which can't be allowed because the occurrence of the resolved class reference is not an exception point, but after some discussion with @dsouzai, I think it's ok. If compilation got a class from a CP entry (of the root method or an inlined method), there will be an AOT validation that checks to make sure that the CP entry is resolved to a class that matches the class chain we saw during compilation, and otherwise the load will fail. So if the unresolved snippet is running, the load has succeeded, which means this validation has succeeded, so the resolution is guaranteed to succeed
I tried UnitTest.java described in https://github.com/eclipse-openj9/openj9/issues/13313#issue-967080261 with aarch64 build which contains https://github.com/eclipse/omr/pull/6137, but I could not recreate the failure.
Change in https://github.com/eclipse/omr/pull/6137 transforms the VFTLoad of known object with fixed class to loadaddr. As discussed in https://github.com/eclipse/omr/pull/6137#pullrequestreview-725459487 in AOT code without Symbol Validation Record, this change was kept limited to the classes loaded by bootstrap class loader. On PPC it seems like, we are missing external relocation record for such code in old AOT and we failed with segmentation fault [1][2]. Uploading a UnitTest.java.txt to reproduce the failures. To reproduce the failure, get the JDK with above change (Here is the build used in [1] and [2].
[1]. https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_sanity.functional_ppc64le_linux_OMR_testList_0/58/console [2]. https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_sanity.functional_ppc64le_linux_OMR_testList_1/58/console