dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.14k stars 4.71k forks source link

On x86_64, JIT could reorder numeric operations to use the flag for subsequent conditional branch but does not do so #109042

Open neon-sunset opened 5 hours ago

neon-sunset commented 5 hours ago

Description

Given simple program

static unsafe void Iterate(int* nums, nuint cnt) {
    var sum = 0;
    var iter = new PtrIter<int>(nums, cnt);

    while (iter.Next(out var n)) {
        sum += n;
    }

    Console.WriteLine(sum);
}

unsafe struct PtrIter<T>(T* ptr, nuint count)
where T: unmanaged {
    public bool Next(out T item) {
        if (count != 0) {
            item = *ptr;
            ptr++;
            count--;
            return true;
        }
        item = default;
        return false;
    }
}

Iterate compiles to

G_M000_IG01:                ;; offset=0x0000
       sub      rsp, 40
G_M000_IG02:                ;; offset=0x0004
       xor      eax, eax
       test     rdx, rdx
       je       SHORT G_M000_IG04
       align    [0 bytes for IG03]
G_M000_IG03:                ;; offset=0x000B
       mov      r8d, dword ptr [rcx]
       add      rcx, 4
       dec      rdx
       add      eax, r8d
       test     rdx, rdx ;; <-- if we reorder dec and add, this test becomes redundant as j.cc can simply consume the flag
       jne      SHORT G_M000_IG03
G_M000_IG04:                ;; offset=0x001D
       mov      ecx, eax
       call     [System.Console:WriteLine(int)]
       nop      
G_M000_IG05:                ;; offset=0x0026
       add      rsp, 40
       ret

which is quite a bit worse than doing similar with a plain array foreach:

G_M000_IG02:                ;; offset=0x0000
       xor      eax, eax
       mov      edx, dword ptr [rcx+0x08]
       test     edx, edx
       jle      SHORT G_M000_IG05
G_M000_IG03:                ;; offset=0x0009
       add      rcx, 16
       align    [0 bytes for IG04]
G_M000_IG04:                ;; offset=0x000D
       add      eax, dword ptr [rcx]
       add      rcx, 4
       dec      edx
       jne      SHORT G_M000_IG04
G_M000_IG05:                ;; offset=0x0017
       mov      ecx, eax
G_M000_IG06:                ;; offset=0x0019
       tail.jmp [System.Console:WriteLine(int)]

Analysis

The test could be elided if JIT gains the ability to perform a peephole which reorders numeric operations where there are potential consumers for the flags that they set.

Another minor note is a missed opportunity to merge mov and add.

I have also noticed that merging pointer dereference and post-increment into *ptr++ leads to worse codegen overall (breaking otherwise perfect output for ARM64), even though it shouldn't.

Configuration

.NET SDK:
 Version:           9.0.100-rtm.24512.1
 Commit:            5b9d9d4677
 Workload version:  9.0.100-manifests.87287131
 MSBuild version:   17.12.3+4ae11fa8e

Regression?

No

dotnet-policy-service[bot] commented 5 hours ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.