dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

Inefficient Boolean comparisons on Arm64 #75852

Open TamarChristinaArm opened 2 years ago

TamarChristinaArm commented 2 years ago

Description

The following example:

class Program
{
    static int Square(bool res) {
        if (res)
          return 9;

        return 5;
    }

  public static void Main(String[] args) {
      Square (args[0]=="foo");
  }
}

Generates:

G_M18028_IG02:              ;; offset=0010H
        B9401FA0          ldr     w0, [fp, #0x1C]
        53001C00          uxtb    w0, w0
        34000080          cbz     w0, G_M18028_IG04
        52800120          mov     w0, #9

which is because the ISA lacks an instruction to compare bytes. However, correct me if I'm wrong, but I believe boolean values in .NET are also represented as 0 or 1 internally. This means we can do the comparison in one instruction by testing only the bottom bit:

G_M18028_IG02:              ;; offset=0010H
        B9401FA0          ldr     w0, [fp, #0x1C]
        34000080          tbz     w0, #0, G_M18028_IG04
        52800120          mov     w0, #9

This will save an instruction and a cycle on every boolean compare.

Configuration

category:cq theme:basic-cq skill-level:intermediate cost:medium impact:medium

ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
### Description The following example: ```csharp class Program { static int Square(bool res) { if (res) return 9; return 5; } public static void Main(String[] args) { Square (args[0]=="foo"); } } ``` Generates: ```assembly G_M18028_IG02: ;; offset=0010H B9401FA0 ldr w0, [fp, #0x1C] 53001C00 uxtb w0, w0 34000080 cbz w0, G_M18028_IG04 52800120 mov w0, #9 ``` which is because the ISA lacks an instruction to compare bytes. However, correct me if I'm wrong, but I believe `boolean` values in .NET are also represented as `0` or `1` internally. This means we can do the comparison in one instruction by testing only the bottom bit: ```assembly G_M18028_IG02: ;; offset=0010H B9401FA0 ldr w0, [fp, #0x1C] 34000080 tbz w0, #0, G_M18028_IG04 52800120 mov w0, #9 ``` This will save an instruction and a cycle on every boolean compare. ### Configuration * Which version of .NET is the code running on? `main` build from `01f701c9ff2b5008528076688b48f602d155427b` * What OS version, and what distro if applicable? Ubuntu * What is the architecture (x64, x86, ARM, ARM64)? ARM64
Author: TamarChristinaArm
Assignees: -
Labels: `tenet-performance`, `area-CodeGen-coreclr`
Milestone: -
SingleAccretion commented 2 years ago

However, correct me if I'm wrong, but I believe boolean values in .NET are also represented as 0 or 1 internally

Not quite. They are specified to be 0/not-0 by the runtime.

It is the C# language that makes it practically hard to come by denormalized booleans - there they have the three states of 0/1/Undefined.

TamarChristinaArm commented 2 years ago

However, correct me if I'm wrong, but I believe boolean values in .NET are also represented as 0 or 1 internally

Not quite. They are specified to be 0/not-0 by the runtime.

It is the C# language that makes it practically hard to come by denormalized booleans - there they have the three states of 0/1/Undefined.

Ah that's interesting, that probably means we can't do this optimization. It looked to me like the JIT is always returning 0 or 1, but I had indeed only looked at C#.

Gnbrkm41 commented 2 years ago

There were a few interesting incompatibilities with "CLR bools" (true for any non-zero, false for zero) and "C# bools" (1 for true, 0 for false, anything else undefined) in the past, where older versions of the C# compiler generates IL that compares against actual values of 1 and 0 instead of 0 / non-0, resulting in neither-1-nor-0 values just passing though true-or-false conditions. https://github.com/dotnet/roslyn/issues/24652

Here are examples of BCL code that normalises bools to be either 1 or 0:

https://github.com/dotnet/runtime/blob/b0920912512aa7e205363b046a1911e6a97df31f/src/libraries/System.Collections/src/System/Collections/BitArray.cs#L139-L141

https://github.com/dotnet/runtime/blob/b0920912512aa7e205363b046a1911e6a97df31f/src/libraries/System.Collections/src/System/Collections/BitArray.cs#L834-L835

kunalspathak commented 1 year ago

@TamarChristinaArm - Is csel expensive than tbz? This is the codegen currently on dce5c4779a2 of main.

G_M39010_IG01:  ;; offset=0000H
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
                                                ;; size=8 bbWeight=1 PerfScore 1.50
G_M39010_IG02:  ;; offset=0008H
            mov     w1, #5
            mov     w2, #9
            tst     w0, #255
            csel    w0, w1, w2, eq
                                                ;; size=16 bbWeight=1 PerfScore 2.00
G_M39010_IG03:  ;; offset=0018H
            ldp     fp, lr, [sp], #0x10
            ret     lr

@a74nh

TamarChristinaArm commented 1 year ago

@TamarChristinaArm - Is csel expensive than tbz? This is the codegen currently on dce5c4779a2 of main.

@kunalspathak No, not directly, but the idea for the original ticket was to replace the tst as well. However the example I gave is now eagerly being handled by if-convert.

Instead consider:

using System;

class Program
{

    [MethodImpl(MethodImplOptions.NoInlining)]
    static int consume(int x) {
    return x+30;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static int Square(bool res, int x) {
        if (res) consume(x);
           return 5;
    }

   [MethodImpl(MethodImplOptions.NoInlining)]
   static int Square2(int res, int x) {
        if (res < 0) consume(x);
            return 5;
    }

  public static void Main(String[] args) {
      Square (args[0]=="foo", 3);
      Square2(3, 4);
  }
}

something that can't be if-converted away (same as a non-trivial if block) still generates:

G_M000_IG01:                ;; offset=0000H
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp

G_M000_IG02:                ;; offset=0008H
            tst     w0, #255
            beq     G_M000_IG04

G_M000_IG03:                ;; offset=0010H
...

G_M000_IG04:                ;; offset=0030H
            mov     w0, #5

G_M000_IG05:                ;; offset=0034H
            ldp     fp, lr, [sp], #0x10
            ret     lr

and

G_M000_IG01:                ;; offset=0000H
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp

G_M000_IG02:                ;; offset=0008H
            tbz     w0, #31, G_M000_IG04

G_M000_IG03:                ;; offset=000CH
...

G_M000_IG04:                ;; offset=002CH
            mov     w0, #5

G_M000_IG05:                ;; offset=0030H
            ldp     fp, lr, [sp], #0x10
            ret     lr

This ticket was that tb[n]z could be used for the boolean comparisons as well (same as the sign bit comparison for the second example which is already done), but that highly depends on the representation of boolean in .NET.

From the discussions before it looks like there's no defined boolean value in .NET? i.e. it's just 0 and !0? If that's the case than it can't be done, but would be good to know :)