dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.42k stars 4.76k forks source link

Neighboring bounds checks aren't collapsed #109983

Open colejohnson66 opened 2 days ago

colejohnson66 commented 2 days ago

Description

Given the following code:

public static uint Test(ReadOnlySpan<uint> span) =>
    span[0] + span[1] + span[2] + span[3];

The only way for this method to fail a bounds/range check is if span.Length < 4. As such, I'd expect the runtime to optimize to something like this:

public static uint Test(ReadOnlySpan<uint> span)
{
    if (span.Length <= 3)
        CORINFO_HELP_RNGCHKFAIL();
    return span[0] + span[1] + span[2] + span[3];
}

Instead, the runtime generates this:

// coreclr trunk-20241119+e33be4d53ab951272c9161c50b549e5e7db60262

Klass:Test(System.ReadOnlySpan`1[uint]):uint (FullOpts):
       push     rax
       test     esi, esi
       je       SHORT G_M30269_IG04
       mov      eax, dword ptr [rdi]
       cmp      esi, 1
       jbe      SHORT G_M30269_IG04
       add      eax, dword ptr [rdi+0x04]
       cmp      esi, 2
       jbe      SHORT G_M30269_IG04
       add      eax, dword ptr [rdi+0x08]
       cmp      esi, 3
       jbe      SHORT G_M30269_IG04
       add      eax, dword ptr [rdi+0x0C]
       add      rsp, 8
       ret      
G_M30269_IG04:  ;; offset=0x0024
       call     CORINFO_HELP_RNGCHKFAIL
       int3     

Note the four range checks against esi. Failures all call into CORIFNO_HELP_RNGCHKFAIL with no indication of which failed. As such, I'd expect the four compare+branch blocks to collapse into a singular one like this:

       cmp esi, 3
       jbe SHORT_G_M30269_IG04
       mov eax, dword ptr [rdi]
       add eax, dword ptr [rdi+0x04]
       add eax, dword ptr [rdi+0x08]
       add eax, dword ptr [rdi+0x0C]
       ret      
G_M30269_IG04:  ;; offset=0x00__
       call     CORINFO_HELP_RNGCHKFAIL
       int3     

Configuration

Godbolt Compiler Explorer against "trunk" (e33be4d53ab951272c9161c50b549e5e7db60262)

Regression?

No. Present in net7.0.19 and net8.0.5.

Notes

Interestingly, a naive loop generates nearly the same assembly, but sets up and tears down a frame:

public static uint Test(ReadOnlySpan<uint> span)
{
    uint sum = 0;
    for (int i = 0; i < 4; i++)
        sum += span[i];
    return sum;
}
// coreclr trunk-20241119+e33be4d53ab951272c9161c50b549e5e7db60262

Klass:Test(System.ReadOnlySpan`1[uint]):uint (FullOpts):
       push     rbp
       mov      rbp, rsp
       test     esi, esi
       je       SHORT G_M30269_IG04
       mov      eax, dword ptr [rdi]
       cmp      esi, 1
       jbe      SHORT G_M30269_IG04
       add      eax, dword ptr [rdi+0x04]
       cmp      esi, 2
       jbe      SHORT G_M30269_IG04
       add      eax, dword ptr [rdi+0x08]
       cmp      esi, 3
       jbe      SHORT G_M30269_IG04
       add      eax, dword ptr [rdi+0x0C]
       pop      rbp
       ret      
G_M30269_IG04:  ;; offset=0x0024
       call     CORINFO_HELP_RNGCHKFAIL
       int3     
dotnet-policy-service[bot] commented 2 days ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

AndyAyersMS commented 1 day ago

As far as I know, we don't yet do any analysis to prove that an execution that might do some things and then throw is equivalent to doing an up-front check and just throwing. Typically, there other are observable side effects from the "things" so these are not equivalent.

It would be interesting to try and figure out how often we see this. Something like: if we have two bounds checks A and B, where A dominates B, and B implies A (eg n > 3 implies n > 2), and there are no other observable side effects between A and B, then we can change the check at A to be B, and remove B. Repeated application of this (or starting with the most dominated check and pushing it up as far as it can go) would get us the "optimized" version above.