dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.25k stars 4.73k forks source link

Utilizing flags set by various instructions, not just `cmp` and `test`. #76502

Open MineCake147E opened 2 years ago

MineCake147E commented 2 years ago

Description

RyuJIT doesn't seem to be interested in flags set by instructions other than cmp and test.

add instruction and flags

For unsigned integers, if the result of the addition overflows and CF is set, the result of the addition is less than one of the input operands.
In this code,

using System;
public static class C
{
    public static uint AddCarry(uint left, uint right, out bool carry){
        var res = left + right;
        carry = res < left;
        return res;
    }
}

Results in

; Core CLR 6.0.822.36306 on amd64

C.AddCarry(UInt32, UInt32, Boolean ByRef)
    L0000: lea eax, [rcx+rdx]   # We can use `add ecx, edx` instead, for utilizing flags set by it.
    L0003: cmp eax, ecx         # This `cmp` instruction is redundunt if we use `add`, but is necessary for `lea`.
    L0005: setb dl
    L0008: mov [r8], dl
    L000b: ret

In this case, LLVM utilizes this fact, for example:

#include <algorithm>
#include <cstddef>
#include <cstdint>

uint32_t Add(uint32_t left, uint32_t right, bool* carry) {
    auto res = left + right;
    *carry = res < left;
    return res;
}
Add(unsigned int, unsigned int, bool*):                             # @Add(unsigned int, unsigned int, bool*)
        mov     eax, edi
        add     eax, esi        # CF is set here if there is a carry.
        setb    byte ptr [rdx]  # `setb` extracts CF here.
        ret

sub instruction with redundant cmp instruction

It might be common to subtract a value from a register only when the value of the register is greater than or equal to that value.

using System;
public static class C
{
    public static uint SubtractIfLargerOrEqual(uint left, uint right, out bool carry){
        bool cf = false;
        var res = left - right;    //CF is set if left < right
        cf = res > left;           //CF can be verified by `res > left` as the res doesn't become greater than the left unless the left is less than the right for unsigned integers.
        if (cf) res = left;
        carry = cf;
        return res;
    }
}
; Core CLR 6.0.822.36306 on amd64

C.SubtractIfLargerOrEqual(UInt32, UInt32, Boolean ByRef)
    L0000: mov eax, ecx
    L0002: sub eax, edx
    L0004: cmp eax, ecx
    L0006: seta dl
    L0009: movzx edx, dl    # Is this `movzx edx, dl` really necessary?
    L000c: test edx, edx    # `test dl, dl` is also valid here.
    L000e: je short L0012
    L0010: mov eax, ecx
    L0012: mov [r8], dl
    L0015: ret

The cmp instruction does the same comparison as the sub instruction, so the sub and cmp instructions with the same register operands change flags in the same way.
In this specific case, LLVM mysteriously transforms this calculation into something like res = left - (left < right ? 0 : right).

#include <algorithm>
#include <cstddef>
#include <cstdint>

uint32_t SubtractIfLargerOrEqualAsm(uint32_t left, uint32_t right,
                                    uint8_t* carry) {
    auto res = left;
    asm("sub %1, %0" : "+r"(res) : "r"(right));
    asm("cmovb %1, %0" : "+r"(res) : "r"(left));
    asm("setb %0" : "=m"(*carry));
    return res;
}

uint32_t SubtractIfLargerOrEqualCpp(uint32_t left, uint32_t right,
                                    uint8_t* carry) {
    uint8_t cf = 0;
    auto res = left - right;
    cf = res > left;
    if (cf) res = left;
    *carry = cf;
    return res;
}
SubtractIfLargerOrEqualAsm(unsigned int, unsigned int, unsigned char*):     # @SubtractIfLargerOrEqualAsm(unsigned int, unsigned int, unsigned char*)
        mov     eax, edi
        sub     eax, esi
        cmovb   eax, edi
        setb    byte ptr [rdx]
        ret
SubtractIfLargerOrEqualCpp(unsigned int, unsigned int, unsigned char*):     # @SubtractIfLargerOrEqualCpp(unsigned int, unsigned int, unsigned char*)
        mov     eax, edi
        xor     ecx, ecx
        cmp     edi, esi
        cmovae  ecx, esi
        setb    byte ptr [rdx]
        sub     eax, ecx
        ret

Configuration

SharpLab
Compiler Explorer

Regression?

No

Data

Analysis

category:cq theme:basic-cq

ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
## Description RyuJIT doesn't seem to be interested in flags set by instructions other than `cmp` and `test`. ### `add` instruction and flags For unsigned integers, if the result of the addition overflows and CF is set, the result of the addition is less than one of the input operands. In this code, ```csharp using System; public static class C { public static uint AddCarry(uint left, uint right, out bool carry){ var res = left + right; carry = res < left; return res; } } ``` Results in ```asm ; Core CLR 6.0.822.36306 on amd64 C.AddCarry(UInt32, UInt32, Boolean ByRef) L0000: lea eax, [rcx+rdx] # We can use `add ecx, edx` instead, for utilizing flags set by it. L0003: cmp eax, ecx # This `cmp` instruction is redundunt if we use `add`, but is necessary for `lea`. L0005: setb dl L0008: mov [r8], dl L000b: ret ``` In this case, LLVM utilizes this fact, for example: ```cpp #include #include #include uint32_t Add(uint32_t left, uint32_t right, bool* carry) { auto res = left + right; *carry = res < left; return res; } ``` ```asm Add(unsigned int, unsigned int, bool*): # @Add(unsigned int, unsigned int, bool*) mov eax, edi add eax, esi # CF is set here if there is a carry. setb byte ptr [rdx] # `setb` extracts CF here. ret ``` ### `sub` instruction with redundant `cmp` instruction It might be common to subtract a value from a register only when the value of the register is greater than or equal to that value. ```csharp using System; public static class C { public static uint SubtractIfLargerOrEqual(uint left, uint right, out bool carry){ bool cf = false; var res = left - right; //CF is set if left < right cf = res > left; //CF can be verified by `res > left` as the res doesn't become greater than the left unless the left is less than the right for unsigned integers. if (cf) res = left; carry = cf; return res; } } ``` ```asm ; Core CLR 6.0.822.36306 on amd64 C.SubtractIfLargerOrEqual(UInt32, UInt32, Boolean ByRef) L0000: mov eax, ecx L0002: sub eax, edx L0004: cmp eax, ecx L0006: seta dl L0009: movzx edx, dl # Is this `movzx edx, dl` really necessary? L000c: test edx, edx # `test dl, dl` is also valid here. L000e: je short L0012 L0010: mov eax, ecx L0012: mov [r8], dl L0015: ret ``` The `cmp` instruction does the same comparison as the `sub` instruction, so the `sub` and `cmp` instructions with the same register operands change flags in the same way. In this specific case, LLVM mysteriously transforms this calculation into something like `res = left - (left < right ? 0 : right)`. ```cpp #include #include #include uint32_t SubtractIfLargerOrEqualAsm(uint32_t left, uint32_t right, uint8_t* carry) { auto res = left; asm("sub %1, %0" : "+r"(res) : "r"(right)); asm("cmovb %1, %0" : "+r"(res) : "r"(left)); asm("setb %0" : "=m"(*carry)); return res; } uint32_t SubtractIfLargerOrEqualCpp(uint32_t left, uint32_t right, uint8_t* carry) { uint8_t cf = 0; auto res = left - right; cf = res > left; if (cf) res = left; *carry = cf; return res; } ``` ```asm SubtractIfLargerOrEqualAsm(unsigned int, unsigned int, unsigned char*): # @SubtractIfLargerOrEqualAsm(unsigned int, unsigned int, unsigned char*) mov eax, edi sub eax, esi cmovb eax, edi setb byte ptr [rdx] ret SubtractIfLargerOrEqualCpp(unsigned int, unsigned int, unsigned char*): # @SubtractIfLargerOrEqualCpp(unsigned int, unsigned int, unsigned char*) mov eax, edi xor ecx, ecx cmp edi, esi cmovae ecx, esi setb byte ptr [rdx] sub eax, ecx ret ``` ## Configuration [SharpLab](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIGYACY8pJ0hgYRoG8aH+mjZqwCuASwB2GBgEEAJnI7YoUAJ4AKcVIYAbGADMMaBlulQxAcwAWRhhBHTgECDoZhlagJS9qAvwwA3ZQZYXAYAXl0DaQBqEMsbAG4+f353FVUIkJgwgB4ow2TfVKYAdmzcIr8AXxT+eiYWE0lpAGURYAwobDAMAEl9ABllCxgoAHkoAFEARxFsHU0WgttTeOtbe0dnV3SvHxKGJxc3fSz9BdwYKpKgqAqsvUMGOHWkvwB6D44AMQYxMJXaRiM5PaT5cwbOr+MBnSKhBgAPhWiUOfmhfhBDHUsM8D0iYJuqT2mUisKJ/mI5VCFNq1GqQA=) [Compiler Explorer](https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1DIApACYAQuYukl9ZATwDKjdAGFUtAK4sGIM6SuADJ4DJgAcj4ARpjEIACcpAAOqAqETgwe3r7%2ByanpAiFhkSwxcYl2mA4ZQgRMxARZPn4BldUCtfUERRHRsQm2dQ1NOa1D3aG9pf3xAJS2qF7EyOwc5gDMocjeWADUJutuYsAkhAgsB9gmGgCCG1s7mPuHyAoE6FhUl9d3ZpsM2y8ewOble71CBG%2Btx%2BXgh6zMAH0CLsbh8ILDBPCkbt6FQCKRdhiCFjkcQ8MAEPjdlFUJ4AFS7UTEYgAT1m%2BwA7FZbrtebsmF4iLtiJgFM8ACI4zB4/aWYXkykHbk3Pm7OlM1kS4Wi55uKV4pU/VUighLBjahSG6Ec8U/GFwxHIoReKIEYhMBwASSoQXqwFiAHliNgAI5eMQ3BQsdEO7G4qlEknyin4o2q9MZzNZ9NEgAcSIZGrZnOVqoFQpFYoOkvjVpVfKYUYg5jMChdsoArFwCeYOxoW7sQLKzNZiC2IJX2UOW2OzGYJwqCLNZnWy02W8gWKgAG5RTvdzv9ueD4ej8eTk8z8fx5erhvrudKAh73tHsyXufVi5ziDq%2Bqs291lLPkTTNC06xMG07VuRNHV2Z1XXdL0fT9QNgzDMQ3CSJIY0xOD4wJWDsTJFNSDTbMKMo1U8wLRl/2LSDgN5GjkWQKgtX7IDyP5QVUAtLV412ABaZNFS4nk%2BTYrVK2ebB9UhcT615PB2IgNj2Rk6t5LvXk/2ZFktTYnTtVNYhzUrCCbQ4eZaE4DteD8DgtFIVBODcaxrF2BRFmWJ4Nh4UgCE0az5gAaxAPsADoNHiDsOQANnhDlcw5ZKNGkWyOEkBzgpczheAUEANEC4L5jgWAkDQFgkjoWJyEoKqavoOJtkMYAkWILwGFCvg6AIWJCogKJcqiUJ6hZTgAqqthBADBhaAmpzeCwFg2vEJbSHwEVqm3UVcswVQqkFVYAohTBMuc2g8Cid1WQ8LBcrdPAWEm6y%2BAMYAFAANTwTAAHcAySRhXpkQQRDEdgpFB%2BQlDUXLdACAwjBQDzLH0a7CsgeZUCSRwBEKjghLedBq1MSxrDMDRhIDKhG2RITVuWBBqwQRs/vO2gCvOqo8b8CBXBGPxu2CSYSjKPQUjSXnBYl/JeZ6MX%2Bm7Npec6YZPGaPQVZqcYFb6OJlfGGXDa6PXpgN%2BZvKWFY9DdTATrezL7NIRznNcjhVFzeKhPiyRGSR4BdggN0utC9kIHc8m0d2XBCBIWV1gPDxqtq4gE7MWZeCCpbl1IcLJHiSL4nSv54o5MwuDnLh4lzfROGyl3cvdgqipKnObM4Mwco25u260XPduINJnEkIA) ## Regression? No ## Data ## Analysis
Author: MineCake147E
Assignees: -
Labels: `tenet-performance`, `area-CodeGen-coreclr`
Milestone: -
JulieLeeMSFT commented 2 years ago

cc @dotnet/jit-contrib.

TIHan commented 1 year ago

Related work: https://github.com/dotnet/runtime/pull/81267

xtqqczze commented 1 year ago

Confirmed on latest nightly with Compiler Explorer:

// crossgen2 8.0.0-preview.3.23168.99+281c462b029c5056cdd446da86692e08a9377b01

C:AddCarry(uint,uint,byref):uint:
       lea      eax, [rcx+rdx]
       cmp      eax, ecx
       setb     dl
       mov      byte  ptr [r8], dl
       ret      

C:SubtractIfLargerOrEqual(uint,uint,byref):uint:
       mov      eax, ecx
       sub      eax, edx
       xor      edx, edx
       cmp      eax, ecx
       seta     dl
       test     edx, edx
       cmovne   eax, ecx
       mov      byte  ptr [r8], dl
       ret      
TIHan commented 1 year ago

Also related: https://github.com/dotnet/runtime/pull/82750

TIHan commented 1 year ago

For AddCarry, I was able to recognize the pattern, but it appears this isn't common and doesn't even appear in our SuperPMI diff tool.

tannergooding commented 1 year ago

AddCarry (and SubtractBorrow) is one that we’d like to explicitly take advantage of in BigInteger, Int128, and UInt128.

For the latter two it’s just a simple scenario like the original post gives. For BigInteger it needs to be many carry’s that are done in a loop and is quite a bit more complex

tannergooding commented 1 year ago

For these types we’re currently using an slightly different bit of code that accounts for the lack of support for AddCarry being recognized today