dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.91k stars 4.63k forks source link

JIT unnecessary stack load/stores for tail jump with stack args involved #105031

Open huoyaoyuan opened 1 month ago

huoyaoyuan commented 1 month ago

https://github.com/dotnet/runtime/pull/104731#discussion_r1681120332

Good case:

    unsafe class MyDelegate3
    {
        object _target;
        delegate*<object, nint, nint, nint, void> _methodPtr;

        public void Invoke(nint arg1, nint arg2, nint arg3) => _methodPtr(_target, arg1, arg2, arg3);
    }
; Method CSPlayground.MyDelegate3:Invoke(long,long,long):this (FullOpts)
G_M6976_IG01:  ;; offset=0x0000
                        ;; size=0 bbWeight=1 PerfScore 0.00

G_M6976_IG02:  ;; offset=0x0000
       mov      rax, qword ptr [rcx+0x10]
       mov      rcx, gword ptr [rcx+0x08]
                        ;; size=8 bbWeight=1 PerfScore 4.00

G_M6976_IG03:  ;; offset=0x0008
       tail.jmp rax
                        ;; size=3 bbWeight=1 PerfScore 2.00
; Total bytes of code: 11

Not good case:

    unsafe class MyDelegate4
    {
        object _target;
        delegate*<object, nint, nint, nint, nint, void> _methodPtr;

        public void Invoke(nint arg1, nint arg2, nint arg3, nint arg4) => _methodPtr(_target, arg1, arg2, arg3, arg4);
    }
; Method CSPlayground.MyDelegate4:Invoke(long,long,long,long):this (FullOpts)
G_M9153_IG01:  ;; offset=0x0000
                        ;; size=0 bbWeight=1 PerfScore 0.00

G_M9153_IG02:  ;; offset=0x0000
       mov      rax, qword ptr [rcx+0x10]
       nop      
                        ;; size=5 bbWeight=1 PerfScore 2.25

G_M9153_IG03:  ;; offset=0x0005
       mov      r10, qword ptr [rsp+0x28]
       mov      qword ptr [rsp+0x28], r10
       mov      rcx, gword ptr [rcx+0x08]
                        ;; size=14 bbWeight=1 PerfScore 4.00

G_M9153_IG04:  ;; offset=0x0013
       tail.jmp rax
                        ;; size=3 bbWeight=1 PerfScore 2.00
; Total bytes of code: 22
    unsafe class MyDelegateComplex
    {
        object _target;
        delegate*<object, int, double, object, Guid, double> _methodPtr;

        public double Invoke(int a, double b, object c, Guid d) => _methodPtr(_target, a, b, c, d);
    }
; Method CSPlayground.MyDelegateComplex:Invoke(int,double,System.Object,System.Guid):double:this (FullOpts)
G_M18646_IG01:  ;; offset=0x0000
       mov      rax, bword ptr [rsp+0x28]
                        ;; size=5 bbWeight=1 PerfScore 1.00

G_M18646_IG02:  ;; offset=0x0005
       mov      r8, qword ptr [rcx+0x10]
       nop      
                        ;; size=5 bbWeight=1 PerfScore 2.25

G_M18646_IG03:  ;; offset=0x000A
       mov      bword ptr [rsp+0x28], rax
       mov      rcx, gword ptr [rcx+0x08]
                        ;; size=9 bbWeight=1 PerfScore 3.00

G_M18646_IG04:  ;; offset=0x0013
       tail.jmp r8
                        ;; size=3 bbWeight=1 PerfScore 2.00
; Total bytes of code: 22

The codegen is fine when all arguments are passed in register. Struct args can make things worse.

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

huoyaoyuan commented 1 month ago

Also: if the method pointer is not stored in a local variable (C# compiler uses local variable), but directly loaded after the arguments, there will be more stack spills.

It may be hard to solve since the original this register is considered overwritten by JIT.

jakobbotsch commented 1 month ago

It's a bit of a hard problem -- we don't know at the point where we create the IR to write the outgoing arguments that we are going to be loading it from the same stack location right before that.