JIT doesn't fold constant multiplication to single instruction

dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

https://docs.microsoft.com/dotnet/core/

MIT License

15.47k stars 4.76k forks source link

JIT doesn't fold constant multiplication to single instruction #91824

Open ayende opened 1 year ago

ayende commented 1 year ago

Description

Consider the following code:

int A(int i)
{
    return i * 4 * 8;
}

int B(int i)
{
    return i * (4 * 8);
}

These two functions should result in identical results.

But the JIT output slightly different code for each:

Program.<<Main>$>g__A|0_0(Int32)
    L0000: mov eax, ecx
    L0002: shl eax, 2
    L0005: shl eax, 3
    L0008: ret

Program.<<Main>$>g__B|0_1(Int32)
    L0000: mov eax, ecx
    L0002: shl eax, 5
    L0005: ret

This is on: ; Core CLR 7.0.823.31807 on x64

Analysis

Given that multiplication is communicative, and that this is a power of two operation, I would expect the JIT to fold them to the 2nd version, without the parenthesis.

The first version takes two instructions instead of 1.

ghost commented 1 year ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

### Description Consider the following code: ``` int A(int i) { return i * 4 * 8; } int B(int i) { return i * (4 * 8); } ``` These two functions should result in identical results. But the JIT output slightly different code for each: ``` Program.<

$>g__A|0_0(Int32) L0000: mov eax, ecx L0002: shl eax, 2 L0005: shl eax, 3 L0008: ret Program.<

$>g__B|0_1(Int32) L0000: mov eax, ecx L0002: shl eax, 5 L0005: ret ``` This is on: `; Core CLR 7.0.823.31807 on x64` ### Analysis Given that multiplication is communicative, and that this is a power of two operation, I would expect the JIT to fold them to the 2nd version, without the parenthesis. The first version takes two instructions instead of 1.

Author:	ayende
Assignees:	-
Labels:	`tenet-performance`, `area-CodeGen-coreclr`
Milestone:	-

EgorBo commented 1 year ago

We do optimize this in JIT but it doesn't work well when constants are small powers of two - those are transformed into shift operation and then confuse the opt.

MichalPetryka commented 1 year ago

Note that the 2nd version is folded by Roslyn:

    .method private hidebysig static 
        int32 A (
            int32 i
        ) cil managed 
    {
        // Method begins at RVA 0x2067
        // Code size 6 (0x6)
        .maxstack 8

        IL_0000: ldarg.0
        IL_0001: ldc.i4.4
        IL_0002: mul
        IL_0003: ldc.i4.8
        IL_0004: mul
        IL_0005: ret
    } // end of method Test::A

    .method private hidebysig static 
        int32 B (
            int32 i
        ) cil managed 
    {
        // Method begins at RVA 0x206e
        // Code size 5 (0x5)
        .maxstack 8

        IL_0000: ldarg.0
        IL_0001: ldc.i4.s 32
        IL_0003: mul
        IL_0004: ret
    } // end of method Test::B

huoyaoyuan commented 1 year ago

those are transformed into shift operation and then confuse the opt.

Aren't shift operations mergable too?

The complex case would be that one number is transformed into shift.

TIHan commented 1 year ago

Currently the codegen for A is:

; Method Program.Program:A(int):int:this (FullOpts)
G_M29096_IG01:  ;; offset=0x0000
                        ;; size=0 bbWeight=1 PerfScore 0.00

G_M29096_IG02:  ;; offset=0x0000
       lea      eax, [4*rdx]
       shl      eax, 3
                        ;; size=10 bbWeight=1 PerfScore 1.00

G_M29096_IG03:  ;; offset=0x000A
       ret      
                        ;; size=1 bbWeight=1 PerfScore 1.00
; Total bytes of code: 11

Avoiding the LEA would be preferable if we can.