eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 720 forks source link

Failed in relocating class address for fabricated loadaddr on Power in non SVM AOT #13313

Open r30shah opened 3 years ago

r30shah commented 3 years ago

Change in https://github.com/eclipse/omr/pull/6137 transforms the VFTLoad of known object with fixed class to loadaddr. As discussed in https://github.com/eclipse/omr/pull/6137#pullrequestreview-725459487 in AOT code without Symbol Validation Record, this change was kept limited to the classes loaded by bootstrap class loader. On PPC it seems like, we are missing external relocation record for such code in old AOT and we failed with segmentation fault [1][2]. Uploading a UnitTest.java.txt to reproduce the failures. To reproduce the failure, get the JDK with above change (Here is the build used in [1] and [2].

$ <failingJDK>/bin/javac UnitTest.java
$ <failingJDK>/bin/java -Xshareclasses:destroyAll  -version
$ <failingJDK>/bin/java -Xjit:forceAOT,count=1,disableAsyncCompilation,verbose,vlog=jitVerb -Xaot:"{java/lang/String.<init>([BIII)V}(traceFull,traceCG,traceFull,log=failing.methodLog)" UnitTest

[1]. https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_sanity.functional_ppc64le_linux_OMR_testList_0/58/console [2]. https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_sanity.functional_ppc64le_linux_OMR_testList_1/58/console

r30shah commented 3 years ago

FYI @zl-wang

jdmpapin commented 3 years ago

Following on from Rahil's comment on eclipse/omr#6144:

May be you can confirm this, do we rely on cpIndex to be -1 to know it is fabricated loadaddr?

Yes, a CP index of -1 is used to determine whether to unset useUnresSnippetToAvoidRelo, and also whether to replace TR_ClassAddress with TR_ArbitraryClassAddress

r30shah commented 3 years ago

Yes, a CP index of -1 is used to determine whether to unset useUnresSnippetToAvoidRelo, and also whether to replace TR_ClassAddress with TR_ArbitraryClassAddress

Hmm, even-though a fabricated loadaddr was created with symRef with cpIndex -1 in the transformation, cpIndex is set to different when we reach code-gen. I have not narrowed down where that happens exactly yet as it was handled well on Z and X.

r30shah commented 3 years ago

With the help of @vijaysun-omr and @jdmpapin , I did more investigations on what is happening. When VP transformation added in this PR transforms the VFT load to loadaddr node, it creates a ClassSymbol with cpIndex -1 [1]. Later in Loop versioner optimization, it generates the instanceOf test for the versioned loop which re-uses the same class symbol, but gets the cpIndex [2] and updates it, so when it reaches the Power codegen, it reaches with the fabricated loadaddr which does not have cpIndex -1 and encounters [3] which choses to use unresolved data snippet on Power for loadaddr.

Digging a bit deeper, pasting some of the findings from the core-dump where originally we failed with segmentation while accessing a field from J9Class pointer.

Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007B6EF678E090 Handler2=00007B6EF64F843C
R0=00007B6EDDA59FE8 R1=00007B6EF73FD680 R2=00007B6EF6B26E00 R3=0000000000017800
R4=00000000FFF69328 R5=0000000000000001 R6=0000000000000015 R7=0000000000000014
R8=0000000000000020 R9=000000000000000E R10=00007B6EDDA59FE8 R11=00007B6EF52CB120
R12=00007B6EF4F1E40C R13=00007B6EF74068E0 R14=0000000000153150 R15=0000000000095700
R16=00007B6EBFE10038 R17=FFFFFFFFFFFFFFFF R18=00007B6EF67D18C8 R19=00007B6EF6B26E00
R20=00007B6EA5461BAE R21=00007B6EF6AEA0AA R22=0008000300080003 R23=00007B6EF73FD940
R24=00007B6EF73FE3E0 R25=00000000FFF69408 R26=0000000000000001 R27=00000000FFF69328
R28=0000000000000014 R29=0000000000000001 R30=0000000000000015 R31=00000000FFF69408
NIP=00007B6EDDA5EF78 MSR=800000010280F033 ORIG_GPR3=00000000000087D0 CTR=00007B6EDDA5F2D4
LINK=00007B6EF52CAF78 XER=0000000000000000 CCR=0000000028004888 SOFTE=0000000000000001
TRAP=0000000000000300 DAR=0000000000017818 dsisr=0000000040000000 RESULT=0000000000000000
...

Looking at the instructions around NIP

0x7b6edda5ef60 {java/lang/String.<init>} +34    0:0         51030048 bl        0x7b6edda5f2b0 U>> +246 Snippet-> Trampoline {libj9jit29.so}{endproc._interpreterUnresolvedInstanceDataStoreGlue} +0    // genericReturn
0x7b6edda5ef64 {java/lang/String.<init>} +35                00006360 ori       r3, r3, 0 <<< ^+256
0x7b6edda5ef68 {java/lang/String.<init>} +36                c6076378 sldi      r3, r3, 0x20
0x7b6edda5ef6c {java/lang/String.<init>} +37                01006364 oris      r3, r3, 1
0x7b6edda5ef70 {java/lang/String.<init>} +38                00786360 ori       r3, r3, 0x7800
0x7b6edda5ef74 {java/lang/String.<init>} +39                00006338 addi      r3, r3, 0
0x7b6edda5ef78 {java/lang/String.<init>} +40                180063e8 ld        r3, 0x18(r3) // Dereferencing J9Class pointer

Looking into the content of r3 which is 0x000000000017800 . Here is the snippet from the instruction selection where this instruction was generated.

------------------------------
 n173n    (  0)  ificmpne --> block_18 BBStart at n177n ()                                            [    0x7b6edc9325d0] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=- nc=3 flg=0x20
 n207n    (  0)    iand (in GPR_0048) (X>=0 cannotOverflow )                                          [    0x7b6edc933070] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=12176 nc=2 flg=0x1100
 n206n    (  0)      l2i (in GPR_0049)                                                                [    0x7b6edc933020] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=12288 nc=1
 n169n    (  0)        lloadi  <isClassAndDepthFlags> methodSymbolIndex 0[#298  Shadow +24] [flags 0x603 0x0 ] (in GPR_0049) (cannotOverflow )  [    0x7b6edc932490] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=12288 nc=1 flg=0x1000
 n168n    (  0)          loadaddr  java/lang/String cpIndex = 86[#393  Static] [flags 0x18307 0x0 ] (in GPR_0052) (X>=0 sharedMemory )  [    0x7b6edc932440] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=13968 nc=0 flg=0x100
 n204n    (  0)      iconst 0x40200000 (X!=0 X>=0 )                                                   [    0x7b6edc932f80] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=- nc=0 flg=0x104
 n193n    (  0)    ==>iconst 0 (X==0 X>=0 X<=0 )
 n659n    (  0)    GlRegDeps ()                                                                       [    0x7b6edc93bdb0] bci=[0,0,-] rc=0 vc=988 vn=- li=19 udi=- nc=5 flg=0x20
 n652n    (  1)      ==>iRegLoad (in GPR_0032)
 n653n    (  1)      ==>iRegLoad (in GPR_0033)
 n654n    (  1)      ==>aRegLoad (in &GPR_0034)
 n655n    (  1)      ==>iRegLoad (in GPR_0035)
 n656n    (  1)      ==>aRegLoad (in &GPR_0036)
------------------------------

 [    0x7b6edcb33590]   0       ld      GPR_0051, [gr16, 0]             ; Backpatched branch to Unresolved Data Snippet Label L0065             # SymRef  java/lang/String cpIndex = -1[#396  Static] [flags 0x18307 0x0 ]
 [    0x7b6edcb33700]   0       addi    GPR_0052, [GPR_0051, 0]         # SymRef  java/lang/String cpIndex = 86[#393  Static] [flags 0x18307 0x0 ]
 [    0x7b6edcb337a0]   0       ld      GPR_0049, [GPR_0052, 24]                # SymRef  <isClassAndDepthFlags> methodSymbolIndex 0[#298  Shadow +24] [flags 0x603 0x0 ]
 [    0x7b6edcb338b0]   0       andis.  GPR_0048, GPR_0049, 16416
 PRE:
POST: [CCR_0053 : cr0]
 [    0x7b6edcb33a80]   0       cmpwi   CCR_0054, GPR_0048, 0
 [    0x7b6edcb33c90]   0       bne     CCR_0054, Label L0018
...
Snippet :
00007B6EDDA84FF0 000003d8                                        Snippet Label L0065:           ; Unresolved Data Snippet
    0x7b6edda84ff0 000003d8                      4812a6f1          bl   00007B6EDDBAF6E0                ; Through Trampoline
    0x7b6edda84ff4 000003dc                      dda84ca0 00007b6e .long        00007B6EDDA84CA0                ; Code cache return address
    0x7b6edda84ffc 000003e4                      80000056          .long        0x80000056              ; pTOC|VD|SY|IX|cpIndex of the data reference
    0x7b6edda85000 000003e8                      a5471528 00007b6e .long        00007B6EA5471528                ; Constant pool address
    0x7b6edda85008 000003f0                      00000000          .long        0x00000000              ; Offset to be merged
    0x7b6edda8500c 000003f4                      3c600000          .long        0x3c600000              ; Instruction template
    0x7b6edda85010 000003f8                      00000000          .long        0x00000000              ; Lock word initialized to 0
    0x7b6edda85014 000003fc                      deadbeef          .long        0xdeadbeef              ; <clinit> case - 1st instruction (0xdeadbeef)
    0x7b6edda85018 00000400                      4bfffc8c          b    00007B6EDDA84CA4                ; <clinit> case - Branch back to main line code

Looking into the instructions emitted for the loadaddr for instanceOf node generated by loop versioner,

------------------------------
 n796n    (  0)  treetop                                                                              [    0x7b6edc93e880] bci=[-1,165,-] rc=0 vc=988 vn=- li=- udi=- nc=1
 n423n    (  1)    loadaddr  java/lang/String cpIndex = 86[#393  Static] [flags 0x18307 0x0 ] (in GPR_0178)  [    0x7b6edc9373f0] bci=[-1,165,-] rc=1 vc=988 vn=- li=40 udi=15744 nc=0
------------------------------

 [    0x7b6edcb43c80]   165     ld      GPR_0177, [gr16, 0]             ; Backpatched branch to Unresolved Data Snippet Label L0177             # SymRef  java/lang/String cpIndex = -1[#413  Static] [flags 0x18307 0x0 ]
 [    0x7b6edcb43df0]   165     addi    GPR_0178, [GPR_0177, 0]         # SymRef  java/lang/String cpIndex = 86[#393  Static] [flags 0x18307 0x0 ]

============================================================
...
Snippet
00007B6EDDA8501C 00000404                                        Snippet Label L0177:           ; Unresolved Data Snippet
    0x7b6edda8501c 00000404                      4812a6c5          bl   00007B6EDDBAF6E0                ; Through Trampoline
    0x7b6edda85020 00000408                      dda84dc4 00007b6e .long        00007B6EDDA84DC4                ; Code cache return address
    0x7b6edda85028 00000410                      80000056          .long        0x80000056              ; pTOC|VD|SY|IX|cpIndex of the data reference
    0x7b6edda8502c 00000414                      a5471528 00007b6e .long        00007B6EA5471528                ; Constant pool address
    0x7b6edda85034 0000041c                      00000000          .long        0x00000000              ; Offset to be merged
    0x7b6edda85038 00000420                      3c400000          .long        0x3c400000              ; Instruction template
    0x7b6edda8503c 00000424                      00000000          .long        0x00000000              ; Lock word initialized to 0
    0x7b6edda85040 00000428                      deadbeef          .long        0xdeadbeef              ; <clinit> case - 1st instruction (0xdeadbeef)
    0x7b6edda85044 0000042c                      4bfffd84          b    00007B6EDDA84DC8                ; <clinit> case - Branch back to main line code

Apart from code cache return address and the instruction template, snippet seemed to be identical. As it fails earlier while de-referencing the J9Class, it does not reach this instanceOf so from the same failing run it was impossible to tell what value was patched at second loadaddr site, so to know if we patch second site with correct address or not, I disabled my change and added a NULLCHK of constant NULL after we pass the instanceOf to see the value being patched for loadaddr java/lang/String node with cpIndex that is not -1 under the instanceOf test.

Snippet from instruction selection that emits instruction for loadaddr

0x7630b075b220 +98           ||   00004260 ori       r2, r2, 0 <<< ^+246 
0x7630b075b224 +99           ||   c6074278 sldi      r2, r2, 0x20
0x7630b075b228 +100          ||   0b004264 oris      r2, r2, 0xb
0x7630b075b22c +101          ||   009b4260 ori       r2, r2, 0x9b00 // Instruction sequence that loads J9Class pointer for java/lang/String into R2
0x7630b075b230 +102          ||   00008238 addi      r4, r2, 0
0x7630b075b234 +103          ||   00003928 cmpldi    r25, 0
0x7630b075b238 +104          ||*  c4008241 beq       0x7630b075b2fc C>> +153
0x7630b075b23c +105          |||  00005980 lwz       r2, 0(r25)
0x7630b075b240 +106          |||  2e004254 rlwinm    r2, r2, 0, 0, 0x17
0x7630b075b244 +107          |||  4020227c cmpld     r2, r4 // Class Equality test comparing the J9Class pointer of the class of `this` object and J9Class pointer of java/lang/String class.
0x7630b075b248 +108          |||* b4008240 bne       0x7630b075b2fc C>> +153
0x7630b075b24c +109          |||| 00000038 li        r0, 0
0x7630b075b250 +110  -1:0    |||| 00008008 tdeqi     r0, 0 // aload0 // Throws NPE, so if class test succeeds previously, it will throw NPE

Looking at the core-dump generated for the thrown NPE, value in R2 that corresponds to the loadaddr is 0x00000000000B9B00 which is a valid J9Class pointer for java/lang/String class.

@zl-wang I am unable to understand how, it can not relocate the fabricated loadaddr in the case of VP transformed vft load, but can do so for the similar loadaddr generated by Loop Versioner under instanceof test using Unresolved data snippet and need expert from Power codegen team to debug this further and provide the correct fix. I have core-dump and method files from both experiments which I can share over chat.

This failure still blocks OMR acceptance, so a correct understanding of why it is failing and if the fix is simple or complicated would help us deciding do we need to merge https://github.com/eclipse/omr/pull/6144 to allow OMR acceptance builds to pass or revert back original VP transformation PR.

FYI @fjeremic , @rmnattas

[1]. https://github.com/eclipse/omr/pull/6137 [2]. https://github.com/eclipse/omr/blob/9ebd8f0834d21b40a6ddc0df88f80fbe8fe57436/compiler/optimizer/LoopVersioner.cpp#L7855-L7862 [3]. https://github.com/eclipse/omr/blob/9ebd8f0834d21b40a6ddc0df88f80fbe8fe57436/compiler/compile/OMRSymbolReferenceTable.cpp#L1376

vijaysun-omr commented 3 years ago

@zl-wang we need your opinion here because the failing job is an OpenJ9 acceptance test.

zl-wang commented 3 years ago

could you verify what the class value was (in the crashing sequence) from original SCC AOT code (i.e. not resolved and patched yet)? could you show me what was written over the bl instruction (it should be an addis/lis i figured)? at this point, i have no reason to believe the crashing sequence was not patched, but wrong value being patched in. i.e. the resolve itself returned a wrong value. maybe out of context is the reason (since that class is not expected there at all).

if you can set a break-point at the resolve-return, we can prove it further. hopefully, you can limit it down to a small set of AOT-JIT methods to debug reasonably.

zl-wang commented 3 years ago

we found the reason it failed. the bad one has incorrect CP in snippet (whether it was not relocated at all, or relocated incorrectly ... to be determined).

@vijaysun-omr for the time being, disabling Rahil's new work on p for non SVM AOT to allow us move forward. we will figure out a fix, then enable it.

zl-wang commented 3 years ago

looks like the said CP was relocated incorrectly. allow @r30shah to add the details here.

r30shah commented 3 years ago

Thanks a lot @zl-wang for the quick look at the reason why it failed. For the reference I am printing snippet from both log file and AOT relocated code from the failing run.

00007B6EDDA84FF0 000003d8                                        Snippet Label L0065:           ; Unresolved Data Snippet
    0x7b6edda84ff0 000003d8                      4812a6f1          bl   00007B6EDDBAF6E0                ; Through Trampoline
    0x7b6edda84ff4 000003dc                      dda84ca0 00007b6e .long        00007B6EDDA84CA0                ; Code cache return address
    0x7b6edda84ffc 000003e4                      80000056          .long        0x80000056              ; pTOC|VD|SY|IX|cpIndex of the data reference
    0x7b6edda85000 000003e8                      a5471528 00007b6e .long        00007B6EA5471528                ; Constant pool address
    0x7b6edda85008 000003f0                      00000000          .long        0x00000000              ; Offset to be merged
    0x7b6edda8500c 000003f4                      3c600000          .long        0x3c600000              ; Instruction template
    0x7b6edda85010 000003f8                      00000000          .long        0x00000000              ; Lock word initialized to 0
    0x7b6edda85014 000003fc                      deadbeef          .long        0xdeadbeef              ; <clinit> case - 1st instruction (0xdeadbeef)
    0x7b6edda85018 00000400                      4bfffc8c          b    00007B6EDDA84CA4                ; <clinit> case - Branch back to main line code

00007B6EDDA8501C 00000404                                        Snippet Label L0177:           ; Unresolved Data Snippet
    0x7b6edda8501c 00000404                      4812a6c5          bl   00007B6EDDBAF6E0                ; Through Trampoline
    0x7b6edda85020 00000408                      dda84dc4 00007b6e .long        00007B6EDDA84DC4                ; Code cache return address
    0x7b6edda85028 00000410                      80000056          .long        0x80000056              ; pTOC|VD|SY|IX|cpIndex of the data reference
    0x7b6edda8502c 00000414                      a5471528 00007b6e .long        00007B6EA5471528                ; Constant pool address
    0x7b6edda85034 0000041c                      00000000          .long        0x00000000              ; Offset to be merged
    0x7b6edda85038 00000420                      3c400000          .long        0x3c400000              ; Instruction template
    0x7b6edda8503c 00000424                      00000000          .long        0x00000000              ; Lock word initialized to 0
    0x7b6edda85040 00000428                      deadbeef          .long        0xdeadbeef              ; <clinit> case - 1st instruction (0xdeadbeef)
    0x7b6edda85044 0000042c                      4bfffd84          b    00007B6EDDA84DC8                ; <clinit> case - Branch back to main line code

In above two snippets, L0065 is generated for the loadaddr generated by VP transformation while second one L0177 was generated for loadaddr under instanceOf node.

Here is how relocated snippets looks from the core-dump.

Relocated Snippet for the loadaddr where patched class is incorrect

0x7b6edda5f2b0 {java/lang/String.<init>} +246               31041548 bl        0x7b6eddbaf6e0 Trampoline {libj9jit29.so}{endproc._interpreterUnresolvedInstanceDataStoreGlue} +0    <<< +34
0x7b6edda5f2b4 {java/lang/String.<init>} +247               60efa5dd stfdu     f13, -0x10a0(r5) -> {java/lang/String.<init>} +34
0x7b6edda5f2b8 {java/lang/String.<init>} +248               6e7b0000 invalid instruction
0x7b6edda5f2bc {java/lang/String.<init>} +249               56000080 lwz       r0, 0x56(0)
0x7b6edda5f2c0 {java/lang/String.<init>} +250               60d80a00 invalid instruction -> CP - {java/lang/Object}
0x7b6edda5f2c4 {java/lang/String.<init>} +251               00000000 invalid instruction
0x7b6edda5f2c8 {java/lang/String.<init>} +252               00000000 invalid instruction
0x7b6edda5f2cc {java/lang/String.<init>} +253               0000603c lis       r3, 0
0x7b6edda5f2d0 {java/lang/String.<init>} +254               00000000 invalid instruction
0x7b6edda5f2d4 {java/lang/String.<init>} +255               0000603c lis       r3, 0
0x7b6edda5f2d8 {java/lang/String.<init>} +256               8cfcff4b b         0x7b6edda5ef64 U>> ^+35

In above case, we relocated CPAddress of java/lang/Object instead of java/lang/String Relocated Snippet for the loadaddr under instanceof node

0x7b6edda5f2dc {java/lang/String.<init>} +257               05041548 bl        0x7b6eddbaf6e0 Trampoline {libj9jit29.so}{endproc._interpreterUnresolvedInstanceDataStoreGlue} +0    <<< +107
0x7b6edda5f2e0 {java/lang/String.<init>} +258               84f0a5dd stfdu     f13, -0xf7c(r5) -> {java/lang/String.<init>} +107
0x7b6edda5f2e4 {java/lang/String.<init>} +259               6e7b0000 invalid instruction
0x7b6edda5f2e8 {java/lang/String.<init>} +260               56000080 lwz       r0, 0x56(0)
0x7b6edda5f2ec {java/lang/String.<init>} +261               f0750b00 invalid instruction -> CP - {java/lang/String}
0x7b6edda5f2f0 {java/lang/String.<init>} +262               00000000 invalid instruction
0x7b6edda5f2f4 {java/lang/String.<init>} +263               00000000 invalid instruction
0x7b6edda5f2f8 {java/lang/String.<init>} +264               0000403c lis       r2, 0
0x7b6edda5f2fc {java/lang/String.<init>} +265               00000000 invalid instruction
0x7b6edda5f300 {java/lang/String.<init>} +266               efbeadde stfdu     f21, -0x4111(r13)
0x7b6edda5f304 {java/lang/String.<init>} +267               84fdff4b b         0x7b6edda5f088 U>> ^+108
r30shah commented 3 years ago

So further debugging this with @rmnattas , We found out why the relocated constant pool address was incorrect for the fabricated loadaddr generated by VP transformation.

As pasted in comment: https://github.com/eclipse-openj9/openj9/issues/13313#issuecomment-900500507

Constant Pool Address is incorrect in the bad case (First snippet). In the relocated AOT code, this is the Updated Constant Pool address for the VPTransformed resolved loadaddr

0x7b6edda5f2c0 {java/lang/String.<init>} +250 60d80a00 invalid instruction -> CP - {java/lang/Object}

The loadaddr node generated by VP transformation, corresponds to calleeIndex 0, and for the unresolvedDataStore snippet constructed for the same, when it creates the relocation record [1], it sets the inlined site index to 0 as what it gets it from the bci of the node. Probably when relocations are applied at that location [2], it gets the constant pool using the ramMethod at that inlinedSite Index which in the failing case is of java/lang/Object.<init> and gets the incorrect ConstantPool Address.

[1]. https://github.com/eclipse-openj9/openj9/blob/df51c203a8b47e8f81df4d0f29c5e1e272771f43/runtime/compiler/p/codegen/J9UnresolvedDataSnippet.cpp#L186-L189 [2]. https://github.com/eclipse-openj9/openj9/blob/df51c203a8b47e8f81df4d0f29c5e1e272771f43/runtime/compiler/runtime/RelocationRecord.cpp#L1498-L1529

0xdaryl commented 3 years ago

I haven't been following this issue too closely to say for sure, but AArch64 uses a similar relocation for the cpIndex in the unresolved data snippet. So if there is a problem with this code on Power, there may be an issue to fix on AArch64 as well. @knn-k @Akira1Saitoh

https://github.com/eclipse-openj9/openj9/blob/df51c203a8b47e8f81df4d0f29c5e1e272771f43/runtime/compiler/aarch64/codegen/J9UnresolvedDataSnippet.cpp#L123-L127

0xdaryl commented 3 years ago

If this applies to AArch64, an AArch64 workaround as in eclipse/omr#6144 is probably in order too. Perhaps someone who is following this problem more closely can confirm.

r30shah commented 3 years ago

@0xdaryl I am looking into the AArch64 codegen[1], and it seems like they will use unresolvedDataSnippet only in the case symRef is actually unresolved. So for loadaddr of resolved J9Class pointer, it seems to be adding external relocation record [2]. In case of P, we use the unresolved data snippet for resolved case. [1]. https://github.com/eclipse/omr/blob/74204a7dfc876643c6fabffc6ccd51b7d3903d5d/compiler/aarch64/codegen/OMRMemoryReference.cpp#L88-L96 [2]. https://github.com/eclipse/omr/blob/74204a7dfc876643c6fabffc6ccd51b7d3903d5d/compiler/aarch64/codegen/OMRTreeEvaluator.cpp#L2446-L2477

0xdaryl commented 3 years ago

Thanks for the investigation @r30shah . I didn't realize the point about the Power unresolved data snippet being used for resolved references, leading to the problem you're investigating.

jdmpapin commented 3 years ago

A few thoughts that came up when discussing this with @r30shah:

We could force the node's bci to indicate an inlined site index that matches the owning method of the symref, but this node isn't an exception point, so I don't think its bci should be relevant. There is also no obvious corresponding bytecode index - there might not even be any bytecode instruction in the owning method that refers (directly) to the constant pool entry (since the constant pool is shared with other methods in that class). So IMO this should be fixed if possible by getting the inlined site index for the relocation from the symref's owning method, rather than from the node

I also got worried about the possibility of these unresolved snippets failing to resolve, which can't be allowed because the occurrence of the resolved class reference is not an exception point, but after some discussion with @dsouzai, I think it's ok. If compilation got a class from a CP entry (of the root method or an inlined method), there will be an AOT validation that checks to make sure that the CP entry is resolved to a class that matches the class chain we saw during compilation, and otherwise the load will fail. So if the unresolved snippet is running, the load has succeeded, which means this validation has succeeded, so the resolution is guaranteed to succeed

Akira1Saitoh commented 3 years ago

I tried UnitTest.java described in https://github.com/eclipse-openj9/openj9/issues/13313#issue-967080261 with aarch64 build which contains https://github.com/eclipse/omr/pull/6137, but I could not recreate the failure.