dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.32k stars 4.74k forks source link

[RyuJit] do not reload constants/fold them. #9605

Open sandreenko opened 6 years ago

sandreenko commented 6 years ago

For such code:

using System;

namespace CQ
{
    public class Program
    {

        static void foo(int a)
        {
            try
            {
                throw new System.IndexOutOfRangeException();
            } // disable inlining.
            catch (System.IndexOutOfRangeException e)
            {
                Console.WriteLine("exception was caught.");
            }
        }

        static int Main(string[] args)
        {
            int l = args.Length + 2; 
            int a = 0x654321;
            a += l;
            foo(0x654321);
            foo(0x654321);
            foo(0x654321);
            return a;
        }
    }
}

jit will generate (checked arm):

IN0001: 000008      ldr     r0, [r0+4]
IN0002: 00000A      adds    r0, r0, 2
IN0003: 00000C      movw    r3, 0x4321
IN0004: 000010      movt    r3, 0x65
IN0005: 000014      adds    r4, r0, r3
IN0006: 000016      movw    r0, 0x4321
IN0007: 00001A      movt    r0, 0x65
IN0008: 00001E      movw    r3, 0x5038
IN0009: 000022      movt    r3, 0x525
IN000a: 000026      ldr     r3, [r3]
IN000b: 000028      blx     r3          // CQ.Program:foo(int)
IN000c: 00002A      movw    r0, 0x4321
IN000d: 00002E      movt    r0, 0x65
IN000e: 000032      movw    r3, 0x5038
IN000f: 000036      movt    r3, 0x525
IN0010: 00003A      ldr     r3, [r3]
IN0011: 00003C      blx     r3          // CQ.Program:foo(int)
IN0012: 00003E      movw    r0, 0x4321
IN0013: 000042      movt    r0, 0x65
IN0014: 000046      movw    r3, 0x5038
IN0015: 00004A      movt    r3, 0x525
IN0016: 00004E      ldr     r3, [r3]
IN0017: 000050      blx     r3          // CQ.Program:foo(int)
IN0018: 000052      mov     r0, r4

the second mov (IN0006, IN0007) here is unwanted, we can do:

IN0001: 000008      ldr     r0, [r0+4]
IN0002: 00000A      adds    r3, r0, 2
IN0003: 00000C      movw    r0, 0x4321 <- use r0, because then it can be reused as argument.
IN0004: 000010      movt    r0, 0x65
IN0005: 000014      adds    r4, r3, r0
IN0008: 00001E      movw    r3, 0x5038
IN0009: 000022      movt    r3, 0x525
IN000a: 000026      ldr     r3, [r3]
IN000b: 000028      blx     r3          // CQ.Program:foo(int)

It is possible to create such example for x86, but it harder (need to use instructions, that do not encode imm and do not trash registers).

There are several examples of such weak code generation:

  1. when constants are used as calls' arguments (for example for VSD dotnet/coreclr#15910);
  2. when arch has a fixed instruction size and does not support long immediate value encoding (RICS);
  3. some instructions needs all arguments to be on stack/registers (it also includes 1.);
  4. ...

Without switching to SSA form a simple solution could be:

  1. Disable const propagation from local variables, that now replaces const local var reads with its value;
  2. Always create LCL_VAR for constants;
  3. Fold the same constant value in one LCL_VAR (create a map from value to lclVar);
  4. Teach lower to decide when to use const value from immediate and when from register/stack.

category:cq theme:basic-cq skill-level:expert cost:medium

sandreenko commented 6 years ago

cc @dotnet/jit-contrib

sandreenko commented 6 years ago

@briansull the issue that I told about.

sandreenko commented 3 years ago

@briansull's CSE work fixed this issue for arm64, now we have the code that we wanted:

IN0001: 00000C                    ldr     w0, [x0,#8]
IN0002: 000010                    add     w19, w0, #2
IN0003: 000014                    movz    w20, #0x4321
IN0004: 000018                    movk    w20, #101 LSL #16 <- w20 is CSE const.
IN0005: 00001C                    mov     w0, w20
IN0006: 000020                    bl      CQ.Program:foo(int)
IN0007: 000024                    mov     w0, w20
IN0008: 000028                    bl      CQ.Program:foo(int)
IN0009: 00002C                    mov     w0, w20
IN000a: 000030                    bl      CQ.Program:foo(int)
IN000b: 000034                    add     w0, w19, w20

but arm32 does not have it enabled by default so we still have many extra instructions:

IN0001: 000008  2B      ldr     r0, [r0+4]
IN0002: 00000A  2B      adds    r4, r0, 2
IN0003: 00000C  4B      movw    r0, 0x4321
IN0004: 000010  4B      movt    r0, 0x65
IN0005: 000014  4B      movw    r3, 0x4560
IN0006: 000018  4B      movt    r3, 0x829
IN0007: 00001C  2B      blx     r3      // CQ.Program:foo(int)
IN0008: 00001E  4B      movw    r0, 0x4321
IN0009: 000022  4B      movt    r0, 0x65
IN000a: 000026  4B      movw    r3, 0x4560
IN000b: 00002A  4B      movt    r3, 0x829
IN000c: 00002E  2B      blx     r3      // CQ.Program:foo(int)
IN000d: 000030  4B      movw    r0, 0x4321
IN000e: 000034  4B      movt    r0, 0x65
IN000f: 000038  4B      movw    r3, 0x4560
IN0010: 00003C  4B      movt    r3, 0x829
IN0011: 000040  2B      blx     r3      // CQ.Program:foo(int)
IN0012: 000042  4B      movw    r0, 0x4321
IN0013: 000046  4B      movt    r0, 0x65
IN0014: 00004A  2B      adds    r0, r4, r0

setting complus_JitConstCSE=4 for arm32 improves the code but not fully:

IN0002: 00000A  1C84           adds    r4, r0, 2
IN0003: 00000C  F244 3521      movw    r5, 0x4321
IN0004: 000010  F2C0 0565      movt    r5, 0x65
IN0005: 000014  4628           mov     r0, r5
IN0006: 000016  F244 5360      movw    r3, 0x4560
IN0007: 00001A  F6C0 0377      movt    r3, 0x877
; Call at 001E [stk=0], GCvars=none, gcrefRegs=0000 {}, byrefRegs=0000 {}
IN0008: 00001E  4798           blx     r3       // CQ.Program:foo(int)
IN0009: 000020  4628           mov     r0, r5
IN000a: 000022  F244 5360      movw    r3, 0x4560
IN000b: 000026  F6C0 0377      movt    r3, 0x877
; Call at 002A [stk=0], GCvars=none, gcrefRegs=0000 {}, byrefRegs=0000 {}
IN000c: 00002A  4798           blx     r3       // CQ.Program:foo(int)
IN000d: 00002C  4628           mov     r0, r5
IN000e: 00002E  F244 5360      movw    r3, 0x4560
IN000f: 000032  F6C0 0377      movt    r3, 0x877
; Call at 0036 [stk=0], GCvars=none, gcrefRegs=0000 {}, byrefRegs=0000 {}
IN0010: 000036  4798           blx     r3       // CQ.Program:foo(int)
IN0011: 000038  1960           adds    r0, r4, r5
                        ;; bbWeight=1    PerfScore 17.00
G_M39155_IG03:        ; func=00, offs=00003AH, size=0004H, epilog, nogc, extend
IN0014: 00003A  E8BD 8830      pop     {r4,r5,r11,pc}

the constant was CSE-d but the function address was not.

@briansull could you please triage this issue? Do you have a plan to enable complus_JitConstCSE for other platforms in 6.0?