dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.08k stars 4.69k forks source link

Unnecessary overflow check for checked BigMul on x64 #105333

Open huoyaoyuan opened 2 months ago

huoyaoyuan commented 2 months ago

For the following code:

        public static long Mul(int a, int b)
        {
            return checked((long)a * (long)b);
        }

Currently on x64 it emits:

; Method CSPlayground.Program:Mul(int,int):long (FullOpts)
G_M63938_IG01:  ;; offset=0x0000
       sub      rsp, 40
                        ;; size=4 bbWeight=1 PerfScore 0.25

G_M63938_IG02:  ;; offset=0x0004
       movsxd   rax, ecx
       movsxd   rcx, edx
       imul     rax, rcx
       jo       SHORT G_M63938_IG04
                        ;; size=12 bbWeight=1 PerfScore 3.50

G_M63938_IG03:  ;; offset=0x0010
       add      rsp, 40
       ret      
                        ;; size=5 bbWeight=1 PerfScore 1.25

G_M63938_IG04:  ;; offset=0x0015
       call     CORINFO_HELP_OVERFLOW
       int3     
                        ;; size=6 bbWeight=0 PerfScore 0.00
; Total bytes of code: 27

There's an overflow check for the IMUL. However on other architectures, including x86/arm/arm64, there's no overflow check.

ARM64:

; Method CSPlayground.Program:Mul(int,int):long (FullOpts)
G_M63938_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
                        ;; size=8 bbWeight=1 PerfScore 1.50

G_M63938_IG02:  ;; offset=0x0008
            smull   x0, w0, w1
                        ;; size=4 bbWeight=1 PerfScore 2.00

G_M63938_IG03:  ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
                        ;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code: 20

ARM32:

; Method CSPlayground.Program:Mul(int,int):long (FullOpts)
G_M63938_IG01:  ;; offset=0x0000
000000      push    {lr}
000002      sub     sp, 12
                        ;; size=4 bbWeight=1 PerfScore 2.00

G_M63938_IG02:  ;; offset=0x0004
000004      smull   r0, r1, r0, r1
                        ;; size=4 bbWeight=1 PerfScore 1.00

G_M63938_IG03:  ;; offset=0x0008
000008      add     sp, 12
00000A      pop     {pc}
                        ;; size=4 bbWeight=1 PerfScore 2.00
; Total bytes of code: 12

x86:

; Method CSPlayground.Program:Mul(int,int):long (FullOpts)
G_M63938_IG01:  ;; offset=0x0000
       sub      esp, 8
                        ;; size=3 bbWeight=1 PerfScore 0.25

G_M63938_IG02:  ;; offset=0x0003
       mov      eax, ecx
       imul     edx:eax, edx
                        ;; size=4 bbWeight=1 PerfScore 3.25

G_M63938_IG03:  ;; offset=0x0007
       add      esp, 8
       ret      
                        ;; size=4 bbWeight=1 PerfScore 1.25
; Total bytes of code: 11

My guess is that x64 is the only platform that supports 64bit*64bit=128bit operations, thus long mul isn't decomposed into MulLow and MulHigh, and fail to discover that the overflow check is unnecessary.

dotnet-policy-service[bot] commented 2 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

huoyaoyuan commented 2 months ago

BTW the following method hits assert in JIT, when viewing cross compile result for 32bit platforms in Disamo:

        public static long Mul(int a, long b)
        {
            return checked((long)a * (long)b);
        }
huoyaoyuan commented 2 months ago

It also reproduces for checked add.

Generally, we should be able to check the operands of checked addition/multiplication, and turn the node to unchecked if we can prove no overflow can happen.

quantumhu commented 3 weeks ago

Hi, I'm interested in taking a look at this issue, what techniques can be used to generally prove no overflow can happen?

huoyaoyuan commented 3 weeks ago

It can be much more complex than you may think. I have already solved this particular case, but haven't decided the best way to solve all the similar cases.

Recognizing upcast in operand of multiplication is straight, but too limited. The JIT has a mechanism to track the possible value range of numbers.