dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.47k stars 4.76k forks source link

[RyuJIT] switch statement can be further optimized when using bitmasks #10830

Open GrabYourPitchforks opened 6 years ago

GrabYourPitchforks commented 6 years ago

RyuJIT has room for optimization in switch statements like the case below.

private static void WriteInteger(uint i)
{
    string s;
    switch (i & 7)
    {
        case 0: s = "0"; break;
        case 1: s = "1"; break;
        case 2: s = "2"; break;
        case 3: s = "3"; break;
        case 4: s = "4"; break;
        case 5: s = "5"; break;
        case 6: s = "6"; break;
        case 7: s = "7"; break;
        default: s = "xxx"; break;
    }
    Console.WriteLine(s);
}

RyuJIT can optimize this by detecting the & (2^n - 1) in the switch statement, concluding that the resulting value must fall into the range [ 0, 2^n - 1 ], and omitting code gen for all other cases. Below is the current codegen, along with my annotations of which statements can be removed with an enlightened JIT.

00007FF9AB5D2152  sub         esp,20h  
00007FF9AB5D2155  mov         esi,ecx  
00007FF9AB5D2157  and         esi,7

; the two lines immediately following can be removed  
00007FF9AB5D215A  cmp         esi,7  
00007FF9AB5D215D  ja          00007FF9AB5D21F3  

00007FF9AB5D2163  mov         eax,esi  
00007FF9AB5D2165  lea         rdx,[7FF9AB5D2210h]  
00007FF9AB5D216C  mov         edx,dword ptr [rdx+rax*4]  
00007FF9AB5D216F  lea         rcx,[7FF9AB5D2155h]  
00007FF9AB5D2176  add         rdx,rcx  
00007FF9AB5D2179  jmp         rdx  
00007FF9AB5D217B  mov         rax,1DF900030C0h  ; case 0
00007FF9AB5D2185  mov         rcx,qword ptr [rax]  
00007FF9AB5D2188  jmp         00007FF9AB5D2200  
00007FF9AB5D218A  mov         rcx,1DF900030C8h  ; case 1
00007FF9AB5D2194  mov         rcx,qword ptr [rcx]  
00007FF9AB5D2197  jmp         00007FF9AB5D2200  
00007FF9AB5D2199  mov         rcx,1DF900030D0h  ; case 2
00007FF9AB5D21A3  mov         rcx,qword ptr [rcx]  
00007FF9AB5D21A6  jmp         00007FF9AB5D2200  
00007FF9AB5D21A8  mov         rcx,1DF900030D8h  ; case 3
00007FF9AB5D21B2  mov         rcx,qword ptr [rcx]  
00007FF9AB5D21B5  jmp         00007FF9AB5D2200  
00007FF9AB5D21B7  mov         rcx,1DF900030E0h  ; case 4
00007FF9AB5D21C1  mov         rcx,qword ptr [rcx]  
00007FF9AB5D21C4  jmp         00007FF9AB5D2200  
00007FF9AB5D21C6  mov         rcx,1DF900030E8h  ; case 5
00007FF9AB5D21D0  mov         rcx,qword ptr [rcx]  
00007FF9AB5D21D3  jmp         00007FF9AB5D2200  
00007FF9AB5D21D5  mov         rcx,1DF900030F0h  ; case 6
00007FF9AB5D21DF  mov         rcx,qword ptr [rcx]  
00007FF9AB5D21E2  jmp         00007FF9AB5D2200  
00007FF9AB5D21E4  mov         rcx,1DF900030F8h  ; case 7
00007FF9AB5D21EE  mov         rcx,qword ptr [rcx]  

; the three lines immediately following can be removed
00007FF9AB5D21F1  jmp         00007FF9AB5D2200  
00007FF9AB5D21F3  mov         rcx,1DF90003100h  ; case default
00007FF9AB5D21FD  mov         rcx,qword ptr [rcx]  

00007FF9AB5D2200  call        00007FF9AB5D1740  
00007FF9AB5D2205  nop  
00007FF9AB5D2206  add         rsp,20h  
00007FF9AB5D220A  pop         rsi  
00007FF9AB5D220B  ret

Additionally, the registers used can be optimized by keeping the input value in ecx/rcx and not bouncing it through esi and eax/rax.

FWIW, the reason I'm looking into this is that I'm investigating optimizing the Marvin.GetHashCode(...) logic, and I'm seeing patterns very similar to this in the optimized code.

category:cq theme:switches skill-level:intermediate cost:large

AndyAyersMS commented 5 years ago

Removing unreachable switch cases seems valuable, perhaps range prop could inspire this.