dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.47k stars 4.76k forks source link

Unnecessary shr emitted for repeated multiplication #74020

Open danmoseley opened 2 years ago

danmoseley commented 2 years ago

Noted in discussion here about the recent DateTime perf optimization, that we could not rely on the JIT to safely collapse multiplication. In this example below, there's an extra shr. Same on x86.

sharplab

class Program
{
    uint f(uint u2) {
        return u2 / 2939745 / 4;
    }

    uint g(uint u2) {
        return u2 / 11758980;
    }
}
Program.f(UInt32)
    L0000: mov eax, edx
    L0002: imul rax, 0x5b4fffcb
    L0009: shr rax, 0x34
    L000d: shr eax, 2
    L0010: ret

Program.g(UInt32)
    L0000: mov eax, edx
    L0002: imul rax, 0x5b4fffcb
    L0009: shr rax, 0x36
    L000d: ret

cc @cassioneri

category:cq theme:basic-cq skill-level:beginner cost:small impact:small

ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
Noted in discussion [here](https://github.com/dotnet/runtime/pull/72712#issuecomment-1203005578) about the recent DateTime perf optimization, that we could not rely on the JIT to safely collapse multiplication. In this example below, there's an extra shr. Same on x86. [sharplab](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgAJiBGAdgFgAoAb0fLfIFcBLAOw3IBmACm59OpAJTkWDdnMo1x5APTlSATgDM6migCsK8igDcrdgF9GZtqP4BzEb34dJ06/OKKXhqrT0AOdX8ABlNZC0ZLBiA=) ```c# class Program { uint f(uint u2) { return u2 / 2939745 / 4; } uint g(uint u2) { return u2 / 11758980; } } ``` ```asm Program.f(UInt32) L0000: mov eax, edx L0002: imul rax, 0x5b4fffcb L0009: shr rax, 0x34 L000d: shr eax, 2 L0010: ret Program.g(UInt32) L0000: mov eax, edx L0002: imul rax, 0x5b4fffcb L0009: shr rax, 0x36 L000d: ret ``` cc @cassioneri
Author: danmoseley
Assignees: -
Labels: `area-CodeGen-coreclr`
Milestone: -
danmoseley commented 2 years ago

Can repeated shr (or shl) be safely collapsed as a peephole optimization? Or are there subtleties/sideeffects that cannot be done in that way.

danmoseley commented 2 years ago

Adding the Godbolt link from @cassioneri for gcc/clang: https://godbolt.org/z/T7M8Pqq1e .. MSVC seems to miss the same thing.

EgorBo commented 2 years ago

Looks like a good first issue for jit first-time contributors 🙂

SkiFoD commented 2 years ago

Hey, I would like to try to work with this issue.