eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.3k stars 723 forks source link

Z: Use inline immediate instructions for interface dispatch #17972

Open Spencer-Comin opened 1 year ago

Spencer-Comin commented 1 year ago

Currently interface dispatch compares the class pointer with class pointers cached in the interface call cached data snippet to determine if there is a cached method pointer to jump to. The cached class and method pointers could be cached in the instruction stream instead of the snippet by using immediate instructions.

Spencer-Comin commented 1 year ago

The current instruction stream generated for interface dispatch looks like

 n6n      (  0)  ResolveAndNULLCHK on n40n [#32]                                                      [     0x3ff2c404800] bci=[-1,3,19] rc=0 vc=169 vn=- li=2 udi=- nc=1
 n5n      (  1)    icalli  InterfaceTest$X.foo()I[#398  unresolved interface Method] (Interface class) [flags 0x400 0x0 ] (in GPR_0035)  [     0x3ff2c4047b0] bci=[-1,3,19] rc=1 vc=169 vn=- li=2 udi=- nc=2
 n4n      (  0)      aloadi  <vft-symbol>[#323  Shadow] [flags 0x18607 0x0 ] (in GPR_0033)            [     0x3ff2c404760] bci=[-1,3,19] rc=0 vc=169 vn=- li=2 udi=- nc=1
 n40n     (  0)        ==>aRegLoad
 n40n     (  0)      ==>aRegLoad
------------------------------

     LLZRGF  GPR_0033, Shadow[<vft-symbol>] 0(&GPR_0032) ; Throws Implicit Null Pointer Exception
     LARL    GPR_0039, 0x000003FF2C543A40     <--- load interface call cache data snippet address
     LARL    GPR_0040,0x0000000000000000
     assocreg
     Label L0036:
     CL      GPR_0033,#400 80(GPR_0039)       <--- compare with class pointer from snippet
     LG      GPR_0073,#401 88(GPR_0039)       <--- load method pointer from snippet
     BCR     BHR(mask=0x8), GPR_0073
     CL      GPR_0033,#402 96(GPR_0039)       <--- compare with class pointer from snippet
     LG      GPR_0073,#403 104(GPR_0039)      <--- load method pointer from snippet
     BCR     BHR(mask=0x8), GPR_0073
     CL      GPR_0033,#404 112(GPR_0039)      <--- compare with class pointer from snippet
     LG      GPR_0073,#405 120(GPR_0039)      <--- load method pointer from snippet
     BCR     BHR(mask=0x8), GPR_0073
     CL      GPR_0033,#406 128(GPR_0039)      <--- compare with class pointer from snippet
     LG      GPR_0073,#407 136(GPR_0039)      <--- load method pointer from snippet
     BCR     BHR(mask=0x8), GPR_0073
     CL      GPR_0033,#408 144(GPR_0039)      <--- compare with class pointer from snippet
     LG      GPR_0073,#409 152(GPR_0039)      <--- load method pointer from snippet
     BCR     BHR(mask=0x8), GPR_0073
     LARL    GPR_0039, 0x000003FF2C5439C0
     BCR     BER(mask=0xf), GPR_0039
     .dd     Snippet Label L0035
     NOP
     assocreg
     Label L0037:

The repeated checks against cached class pointers that look like

CL      <class pointer reg>, <cached classpointer offset>(GPR_0039)
LG      GPR_X, <cached method pointer offset>(GPR_0039)
BCR     <equal mask>, GPR_X

could use immediate instructions such as

CGFI    <class pointer reg>, <cached class pointer constant>
BRCL    <equal mask>, <cached method pointer constant>

This would remove the need for loading from the interface call cache data snippet. This would also remove the need for the class pointer/method pointer slots in the snippet.

Spencer-Comin commented 1 year ago

It appears this functionality is already in the codebase, however it is guarded by a useCLFIandBRCL flag [1] that is always set to false [2][3].

[1] https://github.com/eclipse-openj9/openj9/blob/e8d07cf4374c9eb4cc369bfc45740f5bacdf54ac/runtime/compiler/z/codegen/S390PrivateLinkage.cpp#L2157 [2] https://github.com/eclipse-openj9/openj9/blob/e8d07cf4374c9eb4cc369bfc45740f5bacdf54ac/runtime/compiler/z/codegen/S390PrivateLinkage.cpp#L2077 [3] https://github.com/eclipse-openj9/openj9/blob/e8d07cf4374c9eb4cc369bfc45740f5bacdf54ac/runtime/compiler/z/codegen/S390PrivateLinkage.cpp#L2125

Spencer-Comin commented 1 year ago

After doing some git archaeology, it looks like useCLFIandBRCL has been always false since before open sourcing. @r30shah @joransiu would you know any of the history here as to why this would always be disabled?

joransiu commented 1 year ago

Hmm, I'll have to review the history myself to recall. If I had to guess, it's probably some recomp / code patching path wasn't working properly -- so maybe some PicBuilder/Recomp logic is incorrect.

r30shah commented 1 year ago

@Spencer-Comin One thing to keep in mind when you use BRCL, you get 4G relative jump target which may be smaller to reach to the interface target.

joransiu commented 1 year ago

@Spencer-Comin One thing to keep in mind when you use BRCL, you get 4G relative jump target which may be smaller to reach to the interface target.

Don't we reserve a contiguous virtual memory region for JIT code cache? In that case, JIT to JIT branches are guaranteed to be within +/- 2GB. I guess the scenario we need to care about is if the callee target is not compiled yet?

r30shah commented 1 year ago

Don't we reserve a contiguous virtual memory region for JIT code cache? In that case, JIT to JIT branches are guaranteed to be within +/- 2GB. I guess the scenario we need to care about is if the callee target is not compiled yet?

Yes that is true (Looking at the code in [1], it does look that we do reserve contiguous memory reserve for JIT Code cache ). Looking at the interfaceMultislot code, I do see that we would only update the slots in the JIT code when method is compiled [2], Though @Spencer-Comin would appreciate if you can verify this.

We may need to check how the code-patching to update slot is done (If it is implemented and working). With the addresses stored in the snippet, we keep track of lastUpdatedSlot and use that to atomically update the slot in the snippet.

[1]. https://github.com/eclipse-openj9/openj9/blob/master/runtime/compiler/runtime/J9CodeCacheManager.cpp [2]. https://github.com/eclipse-openj9/openj9/blob/067e50cc5f184ce33d372fdee176ff0803c7e1a9/runtime/compiler/z/runtime/PicBuilder.m4#L1690-L1695

Spencer-Comin commented 1 year ago

I've verified the slots are only updated when the method is compiled.

Here's my test case:

class InterfaceTest {
    static interface X {
        public int foo();
    }

    static class A implements X {
        public int foo() {
            return 1;
        }
    }

    static class B implements X {
        public int foo() {
            return -1;
        }
    }

    static class C implements X {
        public int foo() {
            return 2;
        }
    }

    static class D implements X {
        public int foo() {
            return -2;
        }
    }

    static class E implements X {
        public int foo() {
            return 3;
        }
    }

    static class F implements X {
        public int foo() {
            return -3;
        }
    }

    static int unitTest(X x) {
        return x.foo();
    }

    public static void main(String[] args) {
        X obj;
        A a = new A();
        B b = new B();
        C c = new C();
        D d = new D();
        E e = new E();
        F f = new F();
        int dummy = 0;
        for (int i = 0; i < 10; i++) {
            if (i % 2 == 0) {
                obj = a;
            } else {
                obj = b;
            }

            dummy += unitTest(obj);
        }
        System.out.println(dummy);
    }
}

Heres the cache snippet layout from the trace log:

000003FFCAA95210 00000116                                                 Snippet Label L0035:          ; Interface call cache data snippet
     0x3ffcaa95210 00000116                      00 00 03 ff ca a9 51 88    DC          0x00000000caa95188  # Call site RA
     0x3ffcaa95218 0000011e                      00 00 00 00 00 1e d0 c0    DC          0x00000000001ed0c0  # Address of constant pool 
     0x3ffcaa95220 00000126                      00 00 00 00 00 00 00 03    DC          0x0000000000000003  # CP index
     0x3ffcaa95228 0000012e                      00 00 00 00 00 00 00 00    DC          0x0000000000000000  # Interface class
     0x3ffcaa95230 00000136                      00 00 00 00 00 00 00 00    DC          0x0000000000000000  # Method index
     0x3ffcaa95238 0000013e                      00 00 03 ff ca c7 e9 08    DC          0x00000000cac7e908  # J2I thunk address 
     0x3ffcaa95240 00000146                      00 00 00 00 00 00 00 00    DC          0x0000000000000000  # flags
     0x3ffcaa95248 0000014e                      00 00 03 ff ca a9 52 50    DC          0x00000000caa95250  # last cached slot 
     0x3ffcaa95250 00000156                      00 00 03 ff ca a9 52 60    DC          0x00000000caa95260  # first slot 
     0x3ffcaa95258 0000015e                      00 00 03 ff ca a9 52 90    DC          0x00000000caa95290  # last slot 
     0x3ffcaa95260 00000166                      00 00 00 00 00 00 00 00    DC          0x0000000000000000  # class pointer 0
     0x3ffcaa95268 0000016e                      00 00 00 00 00 00 00 00    DC          0x0000000000000000  # method pointer 0
     0x3ffcaa95270 00000176                      00 00 00 00 00 00 00 00    DC          0x0000000000000000  # class pointer 1
     0x3ffcaa95278 0000017e                      00 00 00 00 00 00 00 00    DC          0x0000000000000000  # method pointer 1
     0x3ffcaa95280 00000186                      00 00 00 00 00 00 00 00    DC          0x0000000000000000  # class pointer 2
     0x3ffcaa95288 0000018e                      00 00 00 00 00 00 00 00    DC          0x0000000000000000  # method pointer 2
     0x3ffcaa95290 00000196                      00 00 00 00 00 00 00 00    DC          0x0000000000000000  # class pointer 3
     0x3ffcaa95298 0000019e                      00 00 00 00 00 00 00 00    DC          0x0000000000000000  # method pointer 3

Here's an extract from gdb showing that the cache is only updated after the method is compiled:

$ gdb --args java -Xjit:"disableAsyncCompilation,dontInline={*unitTest*|*foo*},{*unitTest*}(count=0,breakOnEntry,traceFull,log=trace.log),{*foo*}(count=1,breakAfterCompile)" InterfaceTest
...
Thread 2 "main" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x3fffdff7910 (LWP 15924)]
0x000003ffcaa95102 in ?? ()
(gdb) x/18g 000003FFCAA95210
Invalid number "000003FFCAA95210".
(gdb) x/18xg 0x000003FFCAA95210
0x3ffcaa95210:  0x000003ffcaa95188  0x00000000001ed0c0
0x3ffcaa95220:  0x0000000000000003  0x0000000000000000
0x3ffcaa95230:  0x0000000000000000  0x000003ffcac7e908
0x3ffcaa95240:  0x0000000000000000  0x000003ffcaa95250
0x3ffcaa95250:  0x000003ffcaa95260  0x000003ffcaa95290
0x3ffcaa95260:  0x0000000000000000  0x0000000000000000
0x3ffcaa95270:  0x0000000000000000  0x0000000000000000
0x3ffcaa95280:  0x0000000000000000  0x0000000000000000
0x3ffcaa95290:  0x0000000000000000  0x0000000000000000
(gdb) n
Cannot find bounds of current function
(gdb) c
Continuing.

Thread 2 "main" received signal SIGTRAP, Trace/breakpoint trap.
0x000003ffcaa95102 in ?? ()
(gdb) x/18xg 0x000003FFCAA95210
0x3ffcaa95210:  0x000003ffcaa95188  0x00000000001ed0c0
0x3ffcaa95220:  0x0000000000000003  0x00000000001ed300
0x3ffcaa95230:  0x0000000000000018  0x000003ffcac7e908
0x3ffcaa95240:  0x0100000000000000  0x000003ffcaa95250
0x3ffcaa95250:  0x000003ffcaa95260  0x000003ffcaa95290
0x3ffcaa95260:  0x0000000000000000  0x0000000000000000
0x3ffcaa95270:  0x0000000000000000  0x0000000000000000
0x3ffcaa95280:  0x0000000000000000  0x0000000000000000
0x3ffcaa95290:  0x0000000000000000  0x0000000000000000
(gdb) c
Continuing.

Thread 2 "main" received signal SIGTRAP, Trace/breakpoint trap.
0x000003ffcaa95102 in ?? ()
(gdb) x/18xg 0x000003FFCAA95210
0x3ffcaa95210:  0x000003ffcaa95188  0x00000000001ed0c0
0x3ffcaa95220:  0x0000000000000003  0x00000000001ed300
0x3ffcaa95230:  0x0000000000000018  0x000003ffcac7e908
0x3ffcaa95240:  0x0100000000000000  0x000003ffcaa95250
0x3ffcaa95250:  0x000003ffcaa95260  0x000003ffcaa95290
0x3ffcaa95260:  0x0000000000000000  0x0000000000000000
0x3ffcaa95270:  0x0000000000000000  0x0000000000000000
0x3ffcaa95280:  0x0000000000000000  0x0000000000000000
0x3ffcaa95290:  0x0000000000000000  0x0000000000000000
(gdb) c
Continuing.

=== Finished compiling InterfaceTest$A.foo()I at 0x3ffcaa955fa ===

Thread 4 "JIT Compilation" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x3ffcaa7f910 (LWP 15926)]
raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
51  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) c
Continuing.

Thread 2 "main" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x3fffdff7910 (LWP 15924)]
0x000003ffcaa95102 in ?? ()
(gdb) x/18xg 0x000003FFCAA95210
0x3ffcaa95210:  0x000003ffcaa95188  0x00000000001ed0c0
0x3ffcaa95220:  0x0000000000000003  0x00000000001ed300
0x3ffcaa95230:  0x0000000000000018  0x000003ffcac7e908
0x3ffcaa95240:  0x0100000000000000  0x000003ffcaa95250
0x3ffcaa95250:  0x000003ffcaa95260  0x000003ffcaa95290
0x3ffcaa95260:  0x0000000000000000  0x0000000000000000
0x3ffcaa95270:  0x0000000000000000  0x0000000000000000
0x3ffcaa95280:  0x0000000000000000  0x0000000000000000
0x3ffcaa95290:  0x0000000000000000  0x0000000000000000
(gdb) c
Continuing.

=== Finished compiling InterfaceTest$B.foo()I at 0x3ffcaa959fa ===

Thread 4 "JIT Compilation" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x3ffcaa7f910 (LWP 15926)]
raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
51  in ../sysdeps/unix/sysv/linux/raise.c
(gdb) c
Continuing.

Thread 2 "main" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x3fffdff7910 (LWP 15924)]
0x000003ffcaa95102 in ?? ()
(gdb) x/18xg 0x000003FFCAA95210
0x3ffcaa95210:  0x000003ffcaa95188  0x00000000001ed0c0
0x3ffcaa95220:  0x0000000000000003  0x00000000001ed300
0x3ffcaa95230:  0x0000000000000018  0x000003ffcac7e908
0x3ffcaa95240:  0x0100000000000000  0x000003ffcaa95250
0x3ffcaa95250:  0x000003ffcaa95260  0x000003ffcaa95290
0x3ffcaa95260:  0x0000000000000000  0x0000000000000000
0x3ffcaa95270:  0x0000000000000000  0x0000000000000000
0x3ffcaa95280:  0x0000000000000000  0x0000000000000000
0x3ffcaa95290:  0x0000000000000000  0x0000000000000000
(gdb) c
Continuing.

Thread 2 "main" received signal SIGTRAP, Trace/breakpoint trap.
0x000003ffcaa95102 in ?? ()
(gdb) x/18xg 0x000003FFCAA95210
0x3ffcaa95210:  0x000003ffcaa95188  0x00000000001ed0c0
0x3ffcaa95220:  0x0000000000000003  0x00000000001ed300
0x3ffcaa95230:  0x0000000000000018  0x000003ffcac7e908
0x3ffcaa95240:  0x0100000000000000  0x000003ffcaa95260
0x3ffcaa95250:  0x000003ffcaa95260  0x000003ffcaa95290
0x3ffcaa95260:  0x001ed60000000000  0x000003ffcaa95600
0x3ffcaa95270:  0x0000000000000000  0x0000000000000000
0x3ffcaa95280:  0x0000000000000000  0x0000000000000000
0x3ffcaa95290:  0x0000000000000000  0x0000000000000000
(gdb) c
Continuing.

Thread 2 "main" received signal SIGTRAP, Trace/breakpoint trap.
0x000003ffcaa95102 in ?? ()
(gdb) x/18xg 0x000003FFCAA95210
0x3ffcaa95210:  0x000003ffcaa95188  0x00000000001ed0c0
0x3ffcaa95220:  0x0000000000000003  0x00000000001ed300
0x3ffcaa95230:  0x0000000000000018  0x000003ffcac7e908
0x3ffcaa95240:  0x0100000000000000  0x000003ffcaa95270
0x3ffcaa95250:  0x000003ffcaa95260  0x000003ffcaa95290
0x3ffcaa95260:  0x001ed60000000000  0x000003ffcaa95600
0x3ffcaa95270:  0x001ed80000000000  0x000003ffcaa95a00
0x3ffcaa95280:  0x0000000000000000  0x0000000000000000
0x3ffcaa95290:  0x0000000000000000  0x0000000000000000
(gdb) c
Continuing.
r30shah commented 1 year ago

@Spencer-Comin Thanks a lot for verifying this. I believe next thing for this one is to see if existing path works or not. Also, we need to make sure that the CGFI gets patched on Class Unloading/ redefinition. So exercise you did last time, need to be repeated.