JIT: Add support for strength reduction

dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

https://docs.microsoft.com/dotnet/core/

MIT License

15.44k stars 4.76k forks source link

JIT: Add support for strength reduction #100913

Closed jakobbotsch closed 4 months ago

jakobbotsch commented 7 months ago

Now that we have an SSA based IV analysis (added in #97865) we should implement strength reduction based on it. Example loop:

[MethodImpl(MethodImplOptions.NoInlining)]
private static int StrengthReduce(Span<int> s)
{
    int sum = 0;
    foreach (int val in s)
        sum += val;

    return sum;
}

Codegen x64:

       xor      r8d, r8d
       test     ecx, ecx
       jle      SHORT G_M11380_IG04
       align    [0 bytes for IG03]
                        ;; size=15 bbWeight=1 PerfScore 5.75

G_M11380_IG03:  ;; offset=0x0013
       add      eax, dword ptr [rdx+4*r8]
       inc      r8d
       cmp      r8d, ecx
       jl       SHORT G_M11380_IG03
                        ;; size=12 bbWeight=4 PerfScore 18.00

Codegen arm64:

            mov     w3, wzr
            cmp     w2, #0
            ble     G_M1017_IG04
            align   [0 bytes for IG03]
                        ;; size=24 bbWeight=1 PerfScore 6.50

G_M1017_IG03:  ;; offset=0x0024
            ldr     w4, [x1, w3, UXTW #2]
            add     w0, w4, w0
            add     w3, w3, #1
            cmp     w3, w2
            blt     G_M1017_IG03
                        ;; size=20 bbWeight=4 PerfScore 22.00

The point of strength reduction is to optimize the loop codegen as if it had been written as follows:

[MethodImpl(MethodImplOptions.NoInlining)]
private static int StrengthReduce(Span<int> s)
{
    int sum = 0;
    ref int p = ref MemoryMarshal.GetReference(s);
    ref int end = ref Unsafe.Add(ref p, s.Length);
    while (Unsafe.IsAddressLessThan(ref p, ref end))
    {
        sum += p;
        p = ref Unsafe.Add(ref p, 1);
    }

    return sum;
}

The codegen would look like: x64:

       xor      eax, eax
       mov      rdx, bword ptr [rcx]
       mov      ecx, dword ptr [rcx+0x08]
       lea      rcx, bword ptr [rdx+4*rcx]
       cmp      rdx, rcx
       jae      SHORT G_M11380_IG04
       align    [0 bytes for IG03]
                        ;; size=17 bbWeight=1 PerfScore 6.00

G_M11380_IG03:  ;; offset=0x0011
       add      eax, dword ptr [rdx]
       add      rdx, 4
       cmp      rdx, rcx
       jb       SHORT G_M11380_IG03
                        ;; size=11 bbWeight=4 PerfScore 18.00

arm64:

            mov     w0, wzr
            ldr     x1, [fp, #0x10] // [V00 arg0]
            ldr     w2, [fp, #0x18] // [V00 arg0+0x08]
            ubfiz   x2, x2, #2, #32
            add     x2, x1, x2
            cmp     x1, x2
            bhs     G_M11380_IG04
            align   [0 bytes for IG03]
                        ;; size=28 bbWeight=1 PerfScore 7.50

G_M11380_IG03:  ;; offset=0x0028
            ldr     w3, [x1]
            add     w0, w0, w3
            add     x1, x1, #4
            cmp     x1, x2
            blo     G_M11380_IG03
                        ;; size=20 bbWeight=4 PerfScore 22.00

For arm64 there is the additional possibility of using post-increment addressing mode by optimizing the placement of the IV increment once the strength reduction has happened. The loop body is then reducible to:

G_M11380_IG03:  ;; offset=0x0028
            ldr     w3, [x1], #4
            add     w0, w0, w3
            cmp     x1, x2
            blo     G_M11380_IG03

dotnet-policy-service[bot] commented 7 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

jakobbotsch commented 7 months ago

There is a question of whether we can optimize Span<T> as well as T[] without introducing (more) special status for Span<T>/ReadOnlySpan<T>. That's because the transformation shown above is actually illegal for the JIT to do unless we make it undefined behavior for a Span<T> to exist with an "invalid" range of managed byrefs.

Consider the following example:

static void Main()
{
    int[] values = [1, 2, 3, 4, 0];
    Span<int> exampleSpan = MemoryMarshal.CreateSpan(ref values[0], int.MaxValue);
    Sum(exampleSpan); // No problem today
    Sum2(exampleSpan); // Forms illegal byref
}

private static int Sum(Span<int> s)
{
    int sum = 0;
    foreach (int x in s)
    {
        if (x == 0)
            break;

        sum += x;
    }

    return sum;
}

private static int Sum2(Span<int> s)
{
    int sum = 0;
    ref int p = ref MemoryMarshal.GetReference(s);
    ref int end = ref Unsafe.Add(ref p, s.Length);
    while (Unsafe.IsAddressLessThan(ref p, ref end))
    {
        int x = p;
        if (x == 0)
            break;

        sum += x;
        p = ref Unsafe.Add(ref p, 1);
    }

    return sum;
}

exampleSpan is created with a valid byref but a length that makes _reference + length an invalid byref. Today, there is no problem in Sum because we do not eagerly form the _reference + length byref, but Sum2 ends up eagerly forming this illegal byref. The strength reduction optimization would have the JIT transform Sum to Sum2.

@jkotas @davidwrighton any thoughts on this? Can we document somewhere that Span<T>/ReadOnlySpan<T> have "special status" to make them amenable to optimizations to a similar level to T[]? I think we would document two things:

Non-negative length field. The JIT already makes use of this assumption today.
Requirements on the range of byrefs represented by the Span<T>, i.e. _reference + length must point inside (or at the end of) the same object as _reference when it is a managed byref.

jkotas commented 7 months ago

The existing Span uses do not always follow this restriction. For example:

https://github.com/dotnet/runtime/blob/81ca1c4b1e1eea9c94bdeb38c050d5c4063bab57/src/libraries/System.Runtime/tests/System.Runtime.Extensions.Tests/System/Convert.ToHexString.cs#L96-L97

I guess we can document it retroactively as a breaking change and try to fix all instances of the bad patterns that we can find.

jakobbotsch commented 7 months ago

Hmm, I'll have to see if that seems to be worth it once I get further. I can start out with arrays for now to do the measurements.

jakobbotsch commented 7 months ago

I think instead of forming end = span._reference + span.length * size, we can just utilize a reverse counted loop and come out equal on x64/arm64. For example, Sum2 will usually end up as

private static int Sum2(Span<int> s)
{
    int sum = 0;
    ref int p = ref MemoryMarshal.GetReference(s);
    if (s.Length > 0)
    {
      int length = s.Length;
      do
      {
          int x = p;
          if (x == 0)
              break;

          sum += x;
          p = ref Unsafe.Add(ref p, 1);
      } while (--length > 0);
    }

      return sum;
}

when loop inversion is kicking in. The --length > 0 can be done in 2 instructions + 1 live variable on arm64/x64, exactly the same as if we had formed end.

jakobbotsch commented 4 months ago

We sadly still have the problem described above for Span<T>. Without the assumption that a Span<T> points within the same managed object it is illegal to transform

public static int Sum(Span<int> span, Func<int, bool> sumIndex)
{
    for (int i = 0; i < span.Length; i++)
      sum += sumIndex(i) ? span[i] : 0;
    return sum;
}

into

public static int Sum(Span<int> span, Func<int, bool> sumIndex)
{
    ref int val = ref span[0];
    for (int i = 0; i < span.Length; i++)
    {
      sum += sumIndex(i) ? val : 0;
      val = ref Unsafe.Add(ref val, 1);
    }
    return sum;
}

The same transformation seems ok for arrays.

(Of course whether or not this transformation is profitable is another question entirely.)

jakobbotsch commented 4 months ago

@EgorBot -intel -amd -commit 57f870f909dbfad35142e5aaa6e681464de4f439 vs 82ce118743cbd8f8261b6fb38fe0b0ec08d2030b --disasm

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

namespace Loops
{
    [GroupBenchmarksBy(BenchmarkDotNet.Configs.BenchmarkLogicalGroupRule.ByCategory)]
    public class StrengthReduction
    {
        private short[] _arrayShorts;
        private int[] _arrayInts;
        private long[] _arrayLongs;

        private S3[] _arrayS3;
        private S8[] _arrayS8;
        private S12[] _arrayS12;
        private S16[] _arrayS16;
        private S29[] _arrayS29;

        [GlobalSetup]
        public void Setup()
        {
            _arrayShorts = Enumerable.Range(0, 10000).Select(i => (short)i).ToArray();
            _arrayInts = Enumerable.Range(0, 10000).Select(i => i).ToArray();
            _arrayLongs = Enumerable.Range(0, 10000).Select(i => (long)i).ToArray();

            _arrayS3 = Enumerable.Range(0, 10000).Select(i => new S3 { A = (byte)i, B = (byte)i, C = (byte)i }).ToArray();
            _arrayS8 = Enumerable.Range(0, 10000).Select(i => new S8 { A = i, B = i, }).ToArray();
            _arrayS12 = Enumerable.Range(0, 10000).Select(i => new S12 { A = i, B = i, C = i, }).ToArray();
            _arrayS16 = Enumerable.Range(0, 10000).Select(i => new S16 { A = i, B = i, }).ToArray();
            _arrayS29 = Enumerable.Range(0, 10000).Select(i => new S29 { A = (byte)i, }).ToArray();
        }

        [Benchmark(Baseline = true), BenchmarkCategory("short")]
        public int SumShortsArray()
        {
            return SumShortsWithArray(_arrayShorts);
        }

        [Benchmark, BenchmarkCategory("short")]
        public int SumShortsSpan()
        {
            return SumShortsWithSpan(_arrayShorts);
        }

        [Benchmark, BenchmarkCategory("short")]
        public int SumShortsArrayStrengthReduced()
        {
            return SumShortsStrengthReducedWithArray(_arrayShorts);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsWithArray(short[] input)
        {
            int result = 0;
            // 'or' by 1 to make loop body slightly larger to work around
            // https://github.com/dotnet/runtime/issues/104665
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsWithSpan(ReadOnlySpan<short> input)
        {
            int result = 0;
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsStrengthReducedWithArray(short[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref short p = ref input[0];
                do
                {
                    result += p | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("int")]
        public int SumIntsArray()
        {
            return SumIntsWithArray(_arrayInts);
        }

        [Benchmark, BenchmarkCategory("int")]
        public int SumIntsSpan()
        {
            return SumIntsWithSpan(_arrayInts);
        }

        [Benchmark, BenchmarkCategory("int")]
        public int SumIntsArrayStrengthReduced()
        {
            return SumIntsStrengthReducedWithArray(_arrayInts);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumIntsWithArray(int[] input)
        {
            int result = 0;
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumIntsWithSpan(ReadOnlySpan<int> input)
        {
            int result = 0;
            foreach (int s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumIntsStrengthReducedWithArray(int[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref int p = ref input[0];
                do
                {
                    result += p | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("long")]
        public long SumLongsArray()
        {
            return SumLongsWithArray(_arrayLongs);
        }

        [Benchmark, BenchmarkCategory("long")]
        public long SumLongsSpan()
        {
            return SumLongsWithSpan(_arrayLongs);
        }

        [Benchmark, BenchmarkCategory("long")]
        public long SumLongsArrayStrengthReduced()
        {
            return SumLongsStrengthReducedWithArray(_arrayLongs);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumLongsWithArray(long[] input)
        {
            long result = 0;
            foreach (long s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumLongsWithSpan(ReadOnlySpan<long> input)
        {
            int result = 0;
            foreach (int s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumLongsStrengthReducedWithArray(long[] input)
        {
            long result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref long p = ref input[0];
                do
                {
                    result += p | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S3")]
        public int SumS3Array()
        {
            return SumS3WithArray(_arrayS3);
        }

        [Benchmark, BenchmarkCategory("S3")]
        public int SumS3Span()
        {
            return SumS3WithSpan(_arrayS3);
        }

        [Benchmark, BenchmarkCategory("S3")]
        public int SumS3ArrayStrengthReduced()
        {
            return SumS3StrengthReducedWithArray(_arrayS3);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS3WithArray(S3[] input)
        {
            int result = 0;
            foreach (S3 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS3WithSpan(ReadOnlySpan<S3> input)
        {
            int result = 0;
            foreach (S3 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS3StrengthReducedWithArray(S3[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S3 p = ref input[0];
                do
                {
                    S3 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S8")]
        public int SumS8Array()
        {
            return SumS8WithArray(_arrayS8);
        }

        [Benchmark, BenchmarkCategory("S8")]
        public int SumS8Span()
        {
            return SumS8WithSpan(_arrayS8);
        }

        [Benchmark, BenchmarkCategory("S8")]
        public int SumS8ArrayStrengthReduced()
        {
            return SumS8StrengthReducedWithArray(_arrayS8);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS8WithArray(S8[] input)
        {
            int result = 0;
            foreach (S8 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS8WithSpan(ReadOnlySpan<S8> input)
        {
            int result = 0;
            foreach (S8 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS8StrengthReducedWithArray(S8[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S8 p = ref input[0];
                do
                {
                    S8 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S12")]
        public int SumS12Array()
        {
            return SumS12WithArray(_arrayS12);
        }

        [Benchmark, BenchmarkCategory("S12")]
        public int SumS12Span()
        {
            return SumS12WithSpan(_arrayS12);
        }

        [Benchmark, BenchmarkCategory("S12")]
        public int SumS12ArrayStrengthReduced()
        {
            return SumS12StrengthReducedWithArray(_arrayS12);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS12WithArray(S12[] input)
        {
            int result = 0;
            foreach (S12 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS12WithSpan(ReadOnlySpan<S12> input)
        {
            int result = 0;
            foreach (S12 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS12StrengthReducedWithArray(S12[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S12 p = ref input[0];
                do
                {
                    S12 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S16")]
        public long SumS16Array()
        {
            return SumS16WithArray(_arrayS16);
        }

        [Benchmark, BenchmarkCategory("S16")]
        public long SumS16Span()
        {
            return SumS16WithSpan(_arrayS16);
        }

        [Benchmark, BenchmarkCategory("S16")]
        public long SumS16ArrayStrengthReduced()
        {
            return SumS16StrengthReducedWithArray(_arrayS16);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumS16WithArray(S16[] input)
        {
            long result = 0;
            foreach (S16 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumS16WithSpan(ReadOnlySpan<S16> input)
        {
            long result = 0;
            foreach (S16 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumS16StrengthReducedWithArray(S16[] input)
        {
            long result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S16 p = ref input[0];
                do
                {
                    S16 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S29")]
        public int SumS29Array()
        {
            int sum = 0;
            //for (int i = 0; i < 100; i++)
                sum += SumS29WithArray(_arrayS29);
            return sum;
        }

        [Benchmark, BenchmarkCategory("S29")]
        public int SumS29Span()
        {
            int sum = 0;
            //for (int i = 0; i < 100; i++)
                sum += SumS29WithSpan(_arrayS29);
            return sum;
        }

        [Benchmark, BenchmarkCategory("S29")]
        public int SumS29ArrayStrengthReduced()
        {
            int sum = 0;
            //for (int i = 0; i < 100; i++)
                sum += SumS29StrengthReducedWithArray(_arrayS29);
            return sum;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS29WithArray(S29[] input)
        {
            int result = 0;
            foreach (S29 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS29WithSpan(ReadOnlySpan<S29> input)
        {
            int result = 0;
            foreach (S29 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS29StrengthReducedWithArray(S29[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S29 p = ref input[0];
                do
                {
                    S29 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        private struct S3
        {
            public byte A, B, C;
        }

        public struct S8
        {
            public int A, B;
        }

        public struct S12
        {
            public int A, B, C;
        }

        public struct S16
        {
            public long A, B;
        }

        [StructLayout(LayoutKind.Sequential, Size = 29)]
        public struct S29
        {
            public byte A;
        }
    }
}

jakobbotsch commented 4 months ago

@EgorBot -arm64 -commit 57f870f909dbfad35142e5aaa6e681464de4f439 vs 82ce118743cbd8f8261b6fb38fe0b0ec08d2030b --disasm

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

namespace Loops
{
    [GroupBenchmarksBy(BenchmarkDotNet.Configs.BenchmarkLogicalGroupRule.ByCategory)]
    public class StrengthReduction
    {
        private short[] _arrayShorts;
        private int[] _arrayInts;
        private long[] _arrayLongs;

        private S3[] _arrayS3;
        private S8[] _arrayS8;
        private S12[] _arrayS12;
        private S16[] _arrayS16;
        private S29[] _arrayS29;

        [GlobalSetup]
        public void Setup()
        {
            _arrayShorts = Enumerable.Range(0, 10000).Select(i => (short)i).ToArray();
            _arrayInts = Enumerable.Range(0, 10000).Select(i => i).ToArray();
            _arrayLongs = Enumerable.Range(0, 10000).Select(i => (long)i).ToArray();

            _arrayS3 = Enumerable.Range(0, 10000).Select(i => new S3 { A = (byte)i, B = (byte)i, C = (byte)i }).ToArray();
            _arrayS8 = Enumerable.Range(0, 10000).Select(i => new S8 { A = i, B = i, }).ToArray();
            _arrayS12 = Enumerable.Range(0, 10000).Select(i => new S12 { A = i, B = i, C = i, }).ToArray();
            _arrayS16 = Enumerable.Range(0, 10000).Select(i => new S16 { A = i, B = i, }).ToArray();
            _arrayS29 = Enumerable.Range(0, 10000).Select(i => new S29 { A = (byte)i, }).ToArray();
        }

        [Benchmark(Baseline = true), BenchmarkCategory("short")]
        public int SumShortsArray()
        {
            return SumShortsWithArray(_arrayShorts);
        }

        [Benchmark, BenchmarkCategory("short")]
        public int SumShortsSpan()
        {
            return SumShortsWithSpan(_arrayShorts);
        }

        [Benchmark, BenchmarkCategory("short")]
        public int SumShortsArrayStrengthReduced()
        {
            return SumShortsStrengthReducedWithArray(_arrayShorts);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsWithArray(short[] input)
        {
            int result = 0;
            // 'or' by 1 to make loop body slightly larger to work around
            // https://github.com/dotnet/runtime/issues/104665
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsWithSpan(ReadOnlySpan<short> input)
        {
            int result = 0;
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsStrengthReducedWithArray(short[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref short p = ref input[0];
                do
                {
                    result += p | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("int")]
        public int SumIntsArray()
        {
            return SumIntsWithArray(_arrayInts);
        }

        [Benchmark, BenchmarkCategory("int")]
        public int SumIntsSpan()
        {
            return SumIntsWithSpan(_arrayInts);
        }

        [Benchmark, BenchmarkCategory("int")]
        public int SumIntsArrayStrengthReduced()
        {
            return SumIntsStrengthReducedWithArray(_arrayInts);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumIntsWithArray(int[] input)
        {
            int result = 0;
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumIntsWithSpan(ReadOnlySpan<int> input)
        {
            int result = 0;
            foreach (int s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumIntsStrengthReducedWithArray(int[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref int p = ref input[0];
                do
                {
                    result += p | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("long")]
        public long SumLongsArray()
        {
            return SumLongsWithArray(_arrayLongs);
        }

        [Benchmark, BenchmarkCategory("long")]
        public long SumLongsSpan()
        {
            return SumLongsWithSpan(_arrayLongs);
        }

        [Benchmark, BenchmarkCategory("long")]
        public long SumLongsArrayStrengthReduced()
        {
            return SumLongsStrengthReducedWithArray(_arrayLongs);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumLongsWithArray(long[] input)
        {
            long result = 0;
            foreach (long s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumLongsWithSpan(ReadOnlySpan<long> input)
        {
            int result = 0;
            foreach (int s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumLongsStrengthReducedWithArray(long[] input)
        {
            long result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref long p = ref input[0];
                do
                {
                    result += p | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S3")]
        public int SumS3Array()
        {
            return SumS3WithArray(_arrayS3);
        }

        [Benchmark, BenchmarkCategory("S3")]
        public int SumS3Span()
        {
            return SumS3WithSpan(_arrayS3);
        }

        [Benchmark, BenchmarkCategory("S3")]
        public int SumS3ArrayStrengthReduced()
        {
            return SumS3StrengthReducedWithArray(_arrayS3);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS3WithArray(S3[] input)
        {
            int result = 0;
            foreach (S3 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS3WithSpan(ReadOnlySpan<S3> input)
        {
            int result = 0;
            foreach (S3 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS3StrengthReducedWithArray(S3[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S3 p = ref input[0];
                do
                {
                    S3 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S8")]
        public int SumS8Array()
        {
            return SumS8WithArray(_arrayS8);
        }

        [Benchmark, BenchmarkCategory("S8")]
        public int SumS8Span()
        {
            return SumS8WithSpan(_arrayS8);
        }

        [Benchmark, BenchmarkCategory("S8")]
        public int SumS8ArrayStrengthReduced()
        {
            return SumS8StrengthReducedWithArray(_arrayS8);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS8WithArray(S8[] input)
        {
            int result = 0;
            foreach (S8 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS8WithSpan(ReadOnlySpan<S8> input)
        {
            int result = 0;
            foreach (S8 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS8StrengthReducedWithArray(S8[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S8 p = ref input[0];
                do
                {
                    S8 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S12")]
        public int SumS12Array()
        {
            return SumS12WithArray(_arrayS12);
        }

        [Benchmark, BenchmarkCategory("S12")]
        public int SumS12Span()
        {
            return SumS12WithSpan(_arrayS12);
        }

        [Benchmark, BenchmarkCategory("S12")]
        public int SumS12ArrayStrengthReduced()
        {
            return SumS12StrengthReducedWithArray(_arrayS12);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS12WithArray(S12[] input)
        {
            int result = 0;
            foreach (S12 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS12WithSpan(ReadOnlySpan<S12> input)
        {
            int result = 0;
            foreach (S12 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS12StrengthReducedWithArray(S12[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S12 p = ref input[0];
                do
                {
                    S12 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S16")]
        public long SumS16Array()
        {
            return SumS16WithArray(_arrayS16);
        }

        [Benchmark, BenchmarkCategory("S16")]
        public long SumS16Span()
        {
            return SumS16WithSpan(_arrayS16);
        }

        [Benchmark, BenchmarkCategory("S16")]
        public long SumS16ArrayStrengthReduced()
        {
            return SumS16StrengthReducedWithArray(_arrayS16);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumS16WithArray(S16[] input)
        {
            long result = 0;
            foreach (S16 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumS16WithSpan(ReadOnlySpan<S16> input)
        {
            long result = 0;
            foreach (S16 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumS16StrengthReducedWithArray(S16[] input)
        {
            long result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S16 p = ref input[0];
                do
                {
                    S16 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S29")]
        public int SumS29Array()
        {
            int sum = 0;
            //for (int i = 0; i < 100; i++)
                sum += SumS29WithArray(_arrayS29);
            return sum;
        }

        [Benchmark, BenchmarkCategory("S29")]
        public int SumS29Span()
        {
            int sum = 0;
            //for (int i = 0; i < 100; i++)
                sum += SumS29WithSpan(_arrayS29);
            return sum;
        }

        [Benchmark, BenchmarkCategory("S29")]
        public int SumS29ArrayStrengthReduced()
        {
            int sum = 0;
            //for (int i = 0; i < 100; i++)
                sum += SumS29StrengthReducedWithArray(_arrayS29);
            return sum;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS29WithArray(S29[] input)
        {
            int result = 0;
            foreach (S29 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS29WithSpan(ReadOnlySpan<S29> input)
        {
            int result = 0;
            foreach (S29 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS29StrengthReducedWithArray(S29[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S29 p = ref input[0];
                do
                {
                    S29 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        private struct S3
        {
            public byte A, B, C;
        }

        public struct S8
        {
            public int A, B;
        }

        public struct S12
        {
            public int A, B, C;
        }

        public struct S16
        {
            public long A, B;
        }

        [StructLayout(LayoutKind.Sequential, Size = 29)]
        public struct S29
        {
            public byte A;
        }
    }
}

jakobbotsch commented 4 months ago

@EgorBot -intel -commit 57f870f909dbfad35142e5aaa6e681464de4f439 vs 82ce118743cbd8f8261b6fb38fe0b0ec08d2030b --disasm

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

namespace Loops
{
    [GroupBenchmarksBy(BenchmarkDotNet.Configs.BenchmarkLogicalGroupRule.ByCategory)]
    public class StrengthReduction
    {
        private short[] _arrayShorts;
        private int[] _arrayInts;
        private long[] _arrayLongs;

        private S3[] _arrayS3;
        private S8[] _arrayS8;
        private S12[] _arrayS12;
        private S16[] _arrayS16;
        private S29[] _arrayS29;

        [GlobalSetup]
        public void Setup()
        {
            _arrayShorts = Enumerable.Range(0, 10000).Select(i => (short)i).ToArray();
            _arrayInts = Enumerable.Range(0, 10000).Select(i => i).ToArray();
            _arrayLongs = Enumerable.Range(0, 10000).Select(i => (long)i).ToArray();

            _arrayS3 = Enumerable.Range(0, 10000).Select(i => new S3 { A = (byte)i, B = (byte)i, C = (byte)i }).ToArray();
            _arrayS8 = Enumerable.Range(0, 10000).Select(i => new S8 { A = i, B = i, }).ToArray();
            _arrayS12 = Enumerable.Range(0, 10000).Select(i => new S12 { A = i, B = i, C = i, }).ToArray();
            _arrayS16 = Enumerable.Range(0, 10000).Select(i => new S16 { A = i, B = i, }).ToArray();
            _arrayS29 = Enumerable.Range(0, 10000).Select(i => new S29 { A = (byte)i, }).ToArray();
        }

        [Benchmark(Baseline = true), BenchmarkCategory("short")]
        public int SumShortsArray()
        {
            return SumShortsWithArray(_arrayShorts);
        }

        [Benchmark, BenchmarkCategory("short")]
        public int SumShortsSpan()
        {
            return SumShortsWithSpan(_arrayShorts);
        }

        [Benchmark, BenchmarkCategory("short")]
        public int SumShortsArrayStrengthReduced()
        {
            return SumShortsStrengthReducedWithArray(_arrayShorts);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsWithArray(short[] input)
        {
            int result = 0;
            // 'or' by 1 to make loop body slightly larger to work around
            // https://github.com/dotnet/runtime/issues/104665
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsWithSpan(ReadOnlySpan<short> input)
        {
            int result = 0;
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsStrengthReducedWithArray(short[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref short p = ref input[0];
                do
                {
                    result += p | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("int")]
        public int SumIntsArray()
        {
            return SumIntsWithArray(_arrayInts);
        }

        [Benchmark, BenchmarkCategory("int")]
        public int SumIntsSpan()
        {
            return SumIntsWithSpan(_arrayInts);
        }

        [Benchmark, BenchmarkCategory("int")]
        public int SumIntsArrayStrengthReduced()
        {
            return SumIntsStrengthReducedWithArray(_arrayInts);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumIntsWithArray(int[] input)
        {
            int result = 0;
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumIntsWithSpan(ReadOnlySpan<int> input)
        {
            int result = 0;
            foreach (int s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumIntsStrengthReducedWithArray(int[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref int p = ref input[0];
                do
                {
                    result += p | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("long")]
        public long SumLongsArray()
        {
            return SumLongsWithArray(_arrayLongs);
        }

        [Benchmark, BenchmarkCategory("long")]
        public long SumLongsSpan()
        {
            return SumLongsWithSpan(_arrayLongs);
        }

        [Benchmark, BenchmarkCategory("long")]
        public long SumLongsArrayStrengthReduced()
        {
            return SumLongsStrengthReducedWithArray(_arrayLongs);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumLongsWithArray(long[] input)
        {
            long result = 0;
            foreach (long s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumLongsWithSpan(ReadOnlySpan<long> input)
        {
            int result = 0;
            foreach (int s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumLongsStrengthReducedWithArray(long[] input)
        {
            long result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref long p = ref input[0];
                do
                {
                    result += p | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S3")]
        public int SumS3Array()
        {
            return SumS3WithArray(_arrayS3);
        }

        [Benchmark, BenchmarkCategory("S3")]
        public int SumS3Span()
        {
            return SumS3WithSpan(_arrayS3);
        }

        [Benchmark, BenchmarkCategory("S3")]
        public int SumS3ArrayStrengthReduced()
        {
            return SumS3StrengthReducedWithArray(_arrayS3);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS3WithArray(S3[] input)
        {
            int result = 0;
            foreach (S3 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS3WithSpan(ReadOnlySpan<S3> input)
        {
            int result = 0;
            foreach (S3 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS3StrengthReducedWithArray(S3[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S3 p = ref input[0];
                do
                {
                    S3 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S8")]
        public int SumS8Array()
        {
            return SumS8WithArray(_arrayS8);
        }

        [Benchmark, BenchmarkCategory("S8")]
        public int SumS8Span()
        {
            return SumS8WithSpan(_arrayS8);
        }

        [Benchmark, BenchmarkCategory("S8")]
        public int SumS8ArrayStrengthReduced()
        {
            return SumS8StrengthReducedWithArray(_arrayS8);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS8WithArray(S8[] input)
        {
            int result = 0;
            foreach (S8 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS8WithSpan(ReadOnlySpan<S8> input)
        {
            int result = 0;
            foreach (S8 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS8StrengthReducedWithArray(S8[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S8 p = ref input[0];
                do
                {
                    S8 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S12")]
        public int SumS12Array()
        {
            return SumS12WithArray(_arrayS12);
        }

        [Benchmark, BenchmarkCategory("S12")]
        public int SumS12Span()
        {
            return SumS12WithSpan(_arrayS12);
        }

        [Benchmark, BenchmarkCategory("S12")]
        public int SumS12ArrayStrengthReduced()
        {
            return SumS12StrengthReducedWithArray(_arrayS12);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS12WithArray(S12[] input)
        {
            int result = 0;
            foreach (S12 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS12WithSpan(ReadOnlySpan<S12> input)
        {
            int result = 0;
            foreach (S12 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS12StrengthReducedWithArray(S12[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S12 p = ref input[0];
                do
                {
                    S12 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S16")]
        public long SumS16Array()
        {
            return SumS16WithArray(_arrayS16);
        }

        [Benchmark, BenchmarkCategory("S16")]
        public long SumS16Span()
        {
            return SumS16WithSpan(_arrayS16);
        }

        [Benchmark, BenchmarkCategory("S16")]
        public long SumS16ArrayStrengthReduced()
        {
            return SumS16StrengthReducedWithArray(_arrayS16);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumS16WithArray(S16[] input)
        {
            long result = 0;
            foreach (S16 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumS16WithSpan(ReadOnlySpan<S16> input)
        {
            long result = 0;
            foreach (S16 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumS16StrengthReducedWithArray(S16[] input)
        {
            long result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S16 p = ref input[0];
                do
                {
                    S16 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S29")]
        public int SumS29Array()
        {
            int sum = 0;
            //for (int i = 0; i < 100; i++)
                sum += SumS29WithArray(_arrayS29);
            return sum;
        }

        [Benchmark, BenchmarkCategory("S29")]
        public int SumS29Span()
        {
            int sum = 0;
            //for (int i = 0; i < 100; i++)
                sum += SumS29WithSpan(_arrayS29);
            return sum;
        }

        [Benchmark, BenchmarkCategory("S29")]
        public int SumS29ArrayStrengthReduced()
        {
            int sum = 0;
            //for (int i = 0; i < 100; i++)
                sum += SumS29StrengthReducedWithArray(_arrayS29);
            return sum;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS29WithArray(S29[] input)
        {
            int result = 0;
            foreach (S29 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS29WithSpan(ReadOnlySpan<S29> input)
        {
            int result = 0;
            foreach (S29 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS29StrengthReducedWithArray(S29[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S29 p = ref input[0];
                do
                {
                    S29 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        private struct S3
        {
            public byte A, B, C;
        }

        public struct S8
        {
            public int A, B;
        }

        public struct S12
        {
            public int A, B, C;
        }

        public struct S16
        {
            public long A, B;
        }

        [StructLayout(LayoutKind.Sequential, Size = 29)]
        public struct S29
        {
            public byte A;
        }
    }
}

EgorBot commented 4 months ago

Benchmark results on Amd

``` BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish) AMD EPYC 7763, 1 CPU, 16 logical and 8 physical cores Job-CZUROD : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2 Job-CDPOFN : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2 ``` | Method | Toolchain | Mean | Error | Ratio | Code Size | |------------------------------ |------------------------ |---------:|----------:|------:|----------:| | SumS12Array | Main | 6.218 μs | 0.0105 μs | 1.00 | 47 B | | SumS12Span | Main | 6.302 μs | 0.0009 μs | 1.01 | 70 B | | SumS12ArrayStrengthReduced | Main | 3.283 μs | 0.0104 μs | 0.53 | 44 B | | SumS12Array | PR | 6.190 μs | 0.0027 μs | 1.00 | 47 B | | SumS12Span | PR | 6.303 μs | 0.0009 μs | 1.01 | 70 B | | SumS12ArrayStrengthReduced | PR | 3.283 μs | 0.0070 μs | 0.53 | 44 B | | | | | | | | | SumS16Array | Main | 7.632 μs | 0.0041 μs | 1.00 | 53 B | | SumS16Span | Main | 6.188 μs | 0.0012 μs | 0.81 | 80 B | | SumS16ArrayStrengthReduced | Main | 3.271 μs | 0.0164 μs | 0.43 | 62 B | | SumS16Array | PR | 7.634 μs | 0.0019 μs | 1.00 | 53 B | | SumS16Span | PR | 6.189 μs | 0.0014 μs | 0.81 | 80 B | | SumS16ArrayStrengthReduced | PR | 3.244 μs | 0.0159 μs | 0.42 | 62 B | | | | | | | | | SumS29Array | Main | 4.008 μs | 0.0032 μs | 1.00 | 56 B | | SumS29Span | Main | 4.088 μs | 0.0510 μs | 1.02 | 72 B | | SumS29ArrayStrengthReduced | Main | 3.403 μs | 0.0235 μs | 0.85 | 67 B | | SumS29Array | PR | 4.361 μs | 0.0845 μs | 1.09 | 56 B | | SumS29Span | PR | 3.994 μs | 0.0144 μs | 1.00 | 72 B | | SumS29ArrayStrengthReduced | PR | 3.617 μs | 0.0139 μs | 0.90 | 67 B | | | | | | | | | SumS3Array | Main | 6.278 μs | 0.0042 μs | 1.00 | 49 B | | SumS3Span | Main | 6.546 μs | 0.0142 μs | 1.04 | 72 B | | SumS3ArrayStrengthReduced | Main | 3.270 μs | 0.0044 μs | 0.52 | 60 B | | SumS3Array | PR | 6.186 μs | 0.0018 μs | 0.99 | 49 B | | SumS3Span | PR | 6.477 μs | 0.0144 μs | 1.03 | 72 B | | SumS3ArrayStrengthReduced | PR | 3.268 μs | 0.0074 μs | 0.52 | 60 B | | | | | | | | | SumS8Array | Main | 3.283 μs | 0.0082 μs | 1.00 | 42 B | | SumS8Span | Main | 3.280 μs | 0.0080 μs | 1.00 | 66 B | | SumS8ArrayStrengthReduced | Main | 3.280 μs | 0.0062 μs | 1.00 | 44 B | | SumS8Array | PR | 3.281 μs | 0.0073 μs | 1.00 | 42 B | | SumS8Span | PR | 3.278 μs | 0.0072 μs | 1.00 | 66 B | | SumS8ArrayStrengthReduced | PR | 3.280 μs | 0.0063 μs | 1.00 | 44 B | | | | | | | | | SumIntsArray | Main | 5.112 μs | 0.0057 μs | 1.00 | 44 B | | SumIntsSpan | Main | 3.276 μs | 0.0073 μs | 0.64 | 70 B | | SumIntsArrayStrengthReduced | Main | 3.280 μs | 0.0090 μs | 0.64 | 44 B | | SumIntsArray | PR | 5.067 μs | 0.0054 μs | 0.99 | 44 B | | SumIntsSpan | PR | 3.276 μs | 0.0064 μs | 0.64 | 70 B | | SumIntsArrayStrengthReduced | PR | 3.280 μs | 0.0071 μs | 0.64 | 44 B | | | | | | | | | SumLongsArray | Main | 6.187 μs | 0.0003 μs | 1.00 | 46 B | | SumLongsSpan | Main | 3.279 μs | 0.0072 μs | 0.53 | 72 B | | SumLongsArrayStrengthReduced | Main | 3.275 μs | 0.0090 μs | 0.53 | 62 B | | SumLongsArray | PR | 6.187 μs | 0.0014 μs | 1.00 | 46 B | | SumLongsSpan | PR | 3.281 μs | 0.0098 μs | 0.53 | 72 B | | SumLongsArrayStrengthReduced | PR | 3.275 μs | 0.0124 μs | 0.53 | 62 B | | | | | | | | | SumShortsArray | Main | 5.084 μs | 0.0046 μs | 1.00 | 44 B | | SumShortsSpan | Main | 4.508 μs | 0.0151 μs | 0.89 | 72 B | | SumShortsArrayStrengthReduced | Main | 4.944 μs | 0.0059 μs | 0.97 | 61 B | | SumShortsArray | PR | 5.071 μs | 0.0008 μs | 1.00 | 44 B | | SumShortsSpan | PR | 4.497 μs | 0.0121 μs | 0.88 | 72 B | | SumShortsArrayStrengthReduced | PR | 4.951 μs | 0.0135 μs | 0.97 | 61 B | [BDN_Artifacts.zip](https://telegafiles.blob.core.windows.net/telega/BDN_Artifacts_9028778f.zip)

EgorBo commented 4 months ago

@EgorBot -intel -commit 57f870f909dbfad35142e5aaa6e681464de4f439 vs 82ce118743cbd8f8261b6fb38fe0b0ec08d2030b --disasm

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

namespace Loops
{
    [GroupBenchmarksBy(BenchmarkDotNet.Configs.BenchmarkLogicalGroupRule.ByCategory)]
    public class StrengthReduction
    {
        private short[] _arrayShorts;
        private int[] _arrayInts;
        private long[] _arrayLongs;

        private S3[] _arrayS3;
        private S8[] _arrayS8;
        private S12[] _arrayS12;
        private S16[] _arrayS16;
        private S29[] _arrayS29;

        [GlobalSetup]
        public void Setup()
        {
            _arrayShorts = Enumerable.Range(0, 10000).Select(i => (short)i).ToArray();
            _arrayInts = Enumerable.Range(0, 10000).Select(i => i).ToArray();
            _arrayLongs = Enumerable.Range(0, 10000).Select(i => (long)i).ToArray();

            _arrayS3 = Enumerable.Range(0, 10000).Select(i => new S3 { A = (byte)i, B = (byte)i, C = (byte)i }).ToArray();
            _arrayS8 = Enumerable.Range(0, 10000).Select(i => new S8 { A = i, B = i, }).ToArray();
            _arrayS12 = Enumerable.Range(0, 10000).Select(i => new S12 { A = i, B = i, C = i, }).ToArray();
            _arrayS16 = Enumerable.Range(0, 10000).Select(i => new S16 { A = i, B = i, }).ToArray();
            _arrayS29 = Enumerable.Range(0, 10000).Select(i => new S29 { A = (byte)i, }).ToArray();
        }

        [Benchmark(Baseline = true), BenchmarkCategory("short")]
        public int SumShortsArray()
        {
            return SumShortsWithArray(_arrayShorts);
        }

        [Benchmark, BenchmarkCategory("short")]
        public int SumShortsSpan()
        {
            return SumShortsWithSpan(_arrayShorts);
        }

        [Benchmark, BenchmarkCategory("short")]
        public int SumShortsArrayStrengthReduced()
        {
            return SumShortsStrengthReducedWithArray(_arrayShorts);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsWithArray(short[] input)
        {
            int result = 0;
            // 'or' by 1 to make loop body slightly larger to work around
            // https://github.com/dotnet/runtime/issues/104665
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsWithSpan(ReadOnlySpan<short> input)
        {
            int result = 0;
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsStrengthReducedWithArray(short[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref short p = ref input[0];
                do
                {
                    result += p | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("int")]
        public int SumIntsArray()
        {
            return SumIntsWithArray(_arrayInts);
        }

        [Benchmark, BenchmarkCategory("int")]
        public int SumIntsSpan()
        {
            return SumIntsWithSpan(_arrayInts);
        }

        [Benchmark, BenchmarkCategory("int")]
        public int SumIntsArrayStrengthReduced()
        {
            return SumIntsStrengthReducedWithArray(_arrayInts);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumIntsWithArray(int[] input)
        {
            int result = 0;
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumIntsWithSpan(ReadOnlySpan<int> input)
        {
            int result = 0;
            foreach (int s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumIntsStrengthReducedWithArray(int[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref int p = ref input[0];
                do
                {
                    result += p | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("long")]
        public long SumLongsArray()
        {
            return SumLongsWithArray(_arrayLongs);
        }

        [Benchmark, BenchmarkCategory("long")]
        public long SumLongsSpan()
        {
            return SumLongsWithSpan(_arrayLongs);
        }

        [Benchmark, BenchmarkCategory("long")]
        public long SumLongsArrayStrengthReduced()
        {
            return SumLongsStrengthReducedWithArray(_arrayLongs);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumLongsWithArray(long[] input)
        {
            long result = 0;
            foreach (long s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumLongsWithSpan(ReadOnlySpan<long> input)
        {
            int result = 0;
            foreach (int s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumLongsStrengthReducedWithArray(long[] input)
        {
            long result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref long p = ref input[0];
                do
                {
                    result += p | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S3")]
        public int SumS3Array()
        {
            return SumS3WithArray(_arrayS3);
        }

        [Benchmark, BenchmarkCategory("S3")]
        public int SumS3Span()
        {
            return SumS3WithSpan(_arrayS3);
        }

        [Benchmark, BenchmarkCategory("S3")]
        public int SumS3ArrayStrengthReduced()
        {
            return SumS3StrengthReducedWithArray(_arrayS3);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS3WithArray(S3[] input)
        {
            int result = 0;
            foreach (S3 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS3WithSpan(ReadOnlySpan<S3> input)
        {
            int result = 0;
            foreach (S3 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS3StrengthReducedWithArray(S3[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S3 p = ref input[0];
                do
                {
                    S3 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S8")]
        public int SumS8Array()
        {
            return SumS8WithArray(_arrayS8);
        }

        [Benchmark, BenchmarkCategory("S8")]
        public int SumS8Span()
        {
            return SumS8WithSpan(_arrayS8);
        }

        [Benchmark, BenchmarkCategory("S8")]
        public int SumS8ArrayStrengthReduced()
        {
            return SumS8StrengthReducedWithArray(_arrayS8);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS8WithArray(S8[] input)
        {
            int result = 0;
            foreach (S8 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS8WithSpan(ReadOnlySpan<S8> input)
        {
            int result = 0;
            foreach (S8 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS8StrengthReducedWithArray(S8[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S8 p = ref input[0];
                do
                {
                    S8 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S12")]
        public int SumS12Array()
        {
            return SumS12WithArray(_arrayS12);
        }

        [Benchmark, BenchmarkCategory("S12")]
        public int SumS12Span()
        {
            return SumS12WithSpan(_arrayS12);
        }

        [Benchmark, BenchmarkCategory("S12")]
        public int SumS12ArrayStrengthReduced()
        {
            return SumS12StrengthReducedWithArray(_arrayS12);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS12WithArray(S12[] input)
        {
            int result = 0;
            foreach (S12 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS12WithSpan(ReadOnlySpan<S12> input)
        {
            int result = 0;
            foreach (S12 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS12StrengthReducedWithArray(S12[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S12 p = ref input[0];
                do
                {
                    S12 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S16")]
        public long SumS16Array()
        {
            return SumS16WithArray(_arrayS16);
        }

        [Benchmark, BenchmarkCategory("S16")]
        public long SumS16Span()
        {
            return SumS16WithSpan(_arrayS16);
        }

        [Benchmark, BenchmarkCategory("S16")]
        public long SumS16ArrayStrengthReduced()
        {
            return SumS16StrengthReducedWithArray(_arrayS16);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumS16WithArray(S16[] input)
        {
            long result = 0;
            foreach (S16 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumS16WithSpan(ReadOnlySpan<S16> input)
        {
            long result = 0;
            foreach (S16 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private long SumS16StrengthReducedWithArray(S16[] input)
        {
            long result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S16 p = ref input[0];
                do
                {
                    S16 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [Benchmark(Baseline = true), BenchmarkCategory("S29")]
        public int SumS29Array()
        {
            int sum = 0;
            //for (int i = 0; i < 100; i++)
                sum += SumS29WithArray(_arrayS29);
            return sum;
        }

        [Benchmark, BenchmarkCategory("S29")]
        public int SumS29Span()
        {
            int sum = 0;
            //for (int i = 0; i < 100; i++)
                sum += SumS29WithSpan(_arrayS29);
            return sum;
        }

        [Benchmark, BenchmarkCategory("S29")]
        public int SumS29ArrayStrengthReduced()
        {
            int sum = 0;
            //for (int i = 0; i < 100; i++)
                sum += SumS29StrengthReducedWithArray(_arrayS29);
            return sum;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS29WithArray(S29[] input)
        {
            int result = 0;
            foreach (S29 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS29WithSpan(ReadOnlySpan<S29> input)
        {
            int result = 0;
            foreach (S29 s in input)
                result += s.A | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumS29StrengthReducedWithArray(S29[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref S29 p = ref input[0];
                do
                {
                    S29 s = p;
                    result += s.A | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        private struct S3
        {
            public byte A, B, C;
        }

        public struct S8
        {
            public int A, B;
        }

        public struct S12
        {
            public int A, B, C;
        }

        public struct S16
        {
            public long A, B;
        }

        [StructLayout(LayoutKind.Sequential, Size = 29)]
        public struct S29
        {
            public byte A;
        }
    }
}

EgorBot commented 4 months ago

Benchmark results on Arm64

``` BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish) Unknown processor Job-JFCDJA : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD Job-RCWCSD : .NET 9.0.0 (42.42.42.42424), Arm64 RyuJIT AdvSIMD ``` | Method | Toolchain | Mean | Error | Ratio | Code Size | |------------------------------ |------------------------ |----------:|----------:|------:|----------:| | SumS12Array | Main | 9.997 μs | 0.0005 μs | 1.00 | 108 B | | SumS12Span | Main | 9.386 μs | 0.0007 μs | 0.94 | 140 B | | SumS12ArrayStrengthReduced | Main | 6.817 μs | 0.0003 μs | 0.68 | 100 B | | SumS12Array | PR | 9.996 μs | 0.0006 μs | 1.00 | 108 B | | SumS12Span | PR | 9.387 μs | 0.0008 μs | 0.94 | 140 B | | SumS12ArrayStrengthReduced | PR | 6.818 μs | 0.0004 μs | 0.68 | 100 B | | | | | | | | | SumS16Array | Main | 9.901 μs | 0.0007 μs | 1.00 | 108 B | | SumS16Span | Main | 8.730 μs | 0.0009 μs | 0.88 | 136 B | | SumS16ArrayStrengthReduced | Main | 6.808 μs | 0.0006 μs | 0.69 | 100 B | | SumS16Array | PR | 9.902 μs | 0.0003 μs | 1.00 | 108 B | | SumS16Span | PR | 8.738 μs | 0.0007 μs | 0.88 | 136 B | | SumS16ArrayStrengthReduced | PR | 6.808 μs | 0.0005 μs | 0.69 | 100 B | | | | | | | | | SumS29Array | Main | 10.180 μs | 0.0008 μs | 1.00 | 112 B | | SumS29Span | Main | 9.659 μs | 0.0019 μs | 0.95 | 140 B | | SumS29ArrayStrengthReduced | Main | 7.016 μs | 0.0046 μs | 0.69 | 104 B | | SumS29Array | PR | 10.188 μs | 0.0145 μs | 1.00 | 112 B | | SumS29Span | PR | 9.655 μs | 0.0031 μs | 0.95 | 140 B | | SumS29ArrayStrengthReduced | PR | 7.031 μs | 0.0162 μs | 0.69 | 104 B | | | | | | | | | SumS3Array | Main | 9.733 μs | 0.0005 μs | 1.00 | 108 B | | SumS3Span | Main | 9.029 μs | 0.0010 μs | 0.93 | 140 B | | SumS3ArrayStrengthReduced | Main | 6.172 μs | 0.0003 μs | 0.63 | 100 B | | SumS3Array | PR | 9.733 μs | 0.0005 μs | 1.00 | 108 B | | SumS3Span | PR | 9.030 μs | 0.0014 μs | 0.93 | 140 B | | SumS3ArrayStrengthReduced | PR | 6.173 μs | 0.0006 μs | 0.63 | 100 B | | | | | | | | | SumS8Array | Main | 9.534 μs | 0.0007 μs | 1.00 | 108 B | | SumS8Span | Main | 8.359 μs | 0.0005 μs | 0.88 | 136 B | | SumS8ArrayStrengthReduced | Main | 6.627 μs | 0.0004 μs | 0.70 | 100 B | | SumS8Array | PR | 9.534 μs | 0.0006 μs | 1.00 | 108 B | | SumS8Span | PR | 8.361 μs | 0.0009 μs | 0.88 | 136 B | | SumS8ArrayStrengthReduced | PR | 6.622 μs | 0.0002 μs | 0.69 | 100 B | | | | | | | | | SumIntsArray | Main | 8.018 μs | 0.0005 μs | 1.00 | 104 B | | SumIntsSpan | Main | 6.795 μs | 0.0026 μs | 0.85 | 140 B | | SumIntsArrayStrengthReduced | Main | 6.173 μs | 0.0003 μs | 0.77 | 100 B | | SumIntsArray | PR | 8.018 μs | 0.0004 μs | 1.00 | 104 B | | SumIntsSpan | PR | 6.790 μs | 0.0018 μs | 0.85 | 140 B | | SumIntsArrayStrengthReduced | PR | 6.174 μs | 0.0004 μs | 0.77 | 100 B | | | | | | | | | SumLongsArray | Main | 7.302 μs | 0.0003 μs | 1.00 | 100 B | | SumLongsSpan | Main | 8.351 μs | 0.0009 μs | 1.14 | 148 B | | SumLongsArrayStrengthReduced | Main | 6.598 μs | 0.0004 μs | 0.90 | 100 B | | SumLongsArray | PR | 7.304 μs | 0.0003 μs | 1.00 | 100 B | | SumLongsSpan | PR | 8.337 μs | 0.0006 μs | 1.14 | 148 B | | SumLongsArrayStrengthReduced | PR | 6.610 μs | 0.0004 μs | 0.91 | 100 B | | | | | | | | | SumShortsArray | Main | 8.021 μs | 0.0007 μs | 1.00 | 100 B | | SumShortsSpan | Main | 8.025 μs | 0.0004 μs | 1.00 | 140 B | | SumShortsArrayStrengthReduced | Main | 6.173 μs | 0.0003 μs | 0.77 | 100 B | | SumShortsArray | PR | 8.021 μs | 0.0006 μs | 1.00 | 100 B | | SumShortsSpan | PR | 8.025 μs | 0.0003 μs | 1.00 | 140 B | | SumShortsArrayStrengthReduced | PR | 6.172 μs | 0.0005 μs | 0.77 | 100 B | [BDN_Artifacts.zip](https://telegafiles.blob.core.windows.net/telega/BDN_Artifacts_28375e2f.zip)

EgorBot commented 4 months ago

Benchmark results on Intel

``` BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish) Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores Job-HYATUP : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI Job-VBPFYU : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI ``` | Method | Toolchain | Mean | Error | Ratio | Code Size | |------------------------------ |------------------------ |---------:|----------:|------:|----------:| | SumS12Array | Main | 5.974 μs | 0.0098 μs | 1.00 | 47 B | | SumS12Span | Main | 5.966 μs | 0.0017 μs | 1.00 | 70 B | | SumS12ArrayStrengthReduced | Main | 4.752 μs | 0.0006 μs | 0.80 | 44 B | | SumS12Array | PR | 5.963 μs | 0.0011 μs | 1.00 | 47 B | | SumS12Span | PR | 5.965 μs | 0.0008 μs | 1.00 | 70 B | | SumS12ArrayStrengthReduced | PR | 4.754 μs | 0.0015 μs | 0.80 | 44 B | | | | | | | | | SumS16Array | Main | 5.952 μs | 0.0020 μs | 1.00 | 53 B | | SumS16Span | Main | 5.917 μs | 0.0042 μs | 0.99 | 80 B | | SumS16ArrayStrengthReduced | Main | 5.185 μs | 0.0015 μs | 0.87 | 62 B | | SumS16Array | PR | 5.945 μs | 0.0047 μs | 1.00 | 53 B | | SumS16Span | PR | 5.903 μs | 0.0035 μs | 0.99 | 80 B | | SumS16ArrayStrengthReduced | PR | 5.184 μs | 0.0038 μs | 0.87 | 62 B | | | | | | | | | SumS29Array | Main | 6.761 μs | 0.0009 μs | 1.00 | 56 B | | SumS29Span | Main | 6.765 μs | 0.0013 μs | 1.00 | 72 B | | SumS29ArrayStrengthReduced | Main | 5.684 μs | 0.0027 μs | 0.84 | 67 B | | SumS29Array | PR | 6.760 μs | 0.0006 μs | 1.00 | 56 B | | SumS29Span | PR | 6.762 μs | 0.0009 μs | 1.00 | 72 B | | SumS29ArrayStrengthReduced | PR | 5.681 μs | 0.0028 μs | 0.84 | 67 B | | | | | | | | | SumS3Array | Main | 5.873 μs | 0.0005 μs | 1.00 | 49 B | | SumS3Span | Main | 5.874 μs | 0.0004 μs | 1.00 | 72 B | | SumS3ArrayStrengthReduced | Main | 4.500 μs | 0.0025 μs | 0.77 | 60 B | | SumS3Array | PR | 5.873 μs | 0.0004 μs | 1.00 | 49 B | | SumS3Span | PR | 5.874 μs | 0.0006 μs | 1.00 | 72 B | | SumS3ArrayStrengthReduced | PR | 4.500 μs | 0.0021 μs | 0.77 | 60 B | | | | | | | | | SumS8Array | Main | 5.105 μs | 0.0013 μs | 1.00 | 42 B | | SumS8Span | Main | 5.092 μs | 0.0004 μs | 1.00 | 66 B | | SumS8ArrayStrengthReduced | Main | 4.502 μs | 0.0008 μs | 0.88 | 44 B | | SumS8Array | PR | 5.090 μs | 0.0004 μs | 1.00 | 42 B | | SumS8Span | PR | 5.090 μs | 0.0005 μs | 1.00 | 66 B | | SumS8ArrayStrengthReduced | PR | 4.503 μs | 0.0010 μs | 0.88 | 44 B | | | | | | | | | SumIntsArray | Main | 5.089 μs | 0.0007 μs | 1.00 | 44 B | | SumIntsSpan | Main | 5.085 μs | 0.0008 μs | 1.00 | 70 B | | SumIntsArrayStrengthReduced | Main | 4.491 μs | 0.0012 μs | 0.88 | 44 B | | SumIntsArray | PR | 5.089 μs | 0.0004 μs | 1.00 | 44 B | | SumIntsSpan | PR | 5.080 μs | 0.0006 μs | 1.00 | 70 B | | SumIntsArrayStrengthReduced | PR | 4.492 μs | 0.0018 μs | 0.88 | 44 B | | | | | | | | | SumLongsArray | Main | 5.091 μs | 0.0004 μs | 1.00 | 46 B | | SumLongsSpan | Main | 5.095 μs | 0.0005 μs | 1.00 | 72 B | | SumLongsArrayStrengthReduced | Main | 4.509 μs | 0.0022 μs | 0.89 | 62 B | | SumLongsArray | PR | 5.090 μs | 0.0014 μs | 1.00 | 46 B | | SumLongsSpan | PR | 5.092 μs | 0.0010 μs | 1.00 | 72 B | | SumLongsArrayStrengthReduced | PR | 4.510 μs | 0.0029 μs | 0.89 | 62 B | | | | | | | | | SumShortsArray | Main | 5.086 μs | 0.0007 μs | 1.00 | 44 B | | SumShortsSpan | Main | 5.078 μs | 0.0009 μs | 1.00 | 72 B | | SumShortsArrayStrengthReduced | Main | 4.499 μs | 0.0026 μs | 0.88 | 61 B | | SumShortsArray | PR | 5.086 μs | 0.0004 μs | 1.00 | 44 B | | SumShortsSpan | PR | 5.078 μs | 0.0007 μs | 1.00 | 72 B | | SumShortsArrayStrengthReduced | PR | 4.485 μs | 0.0013 μs | 0.88 | 61 B | [BDN_Artifacts.zip](https://telegafiles.blob.core.windows.net/telega/BDN_Artifacts_0bfb1a4b.zip)

jakobbotsch commented 4 months ago

@EgorBot -intel -commit 57f870f909dbfad35142e5aaa6e681464de4f439 vs 82ce118743cbd8f8261b6fb38fe0b0ec08d2030b --disasm

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

namespace Loops
{
    [GroupBenchmarksBy(BenchmarkDotNet.Configs.BenchmarkLogicalGroupRule.ByCategory)]
    public class StrengthReduction
    {
        private short[] _arrayShorts;

        [GlobalSetup]
        public void Setup()
        {
            _arrayShorts = Enumerable.Range(0, 10000).Select(i => (short)i).ToArray();
        }

        [Benchmark(Baseline = true), BenchmarkCategory("short")]
        public int SumShortsArray()
        {
            return SumShortsWithArray(_arrayShorts);
        }

        [Benchmark, BenchmarkCategory("short")]
        public int SumShortsSpan()
        {
            return SumShortsWithSpan(_arrayShorts);
        }

        [Benchmark, BenchmarkCategory("short")]
        public int SumShortsArrayStrengthReduced()
        {
            return SumShortsStrengthReducedWithArray(_arrayShorts);
        }

        [Benchmark, BenchmarkCategory("short")]
        public int SumShortsSpanStrengthReduced()
        {
            return SumShortsStrengthReducedWithSpan(_arrayShorts);
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsWithArray(short[] input)
        {
            int result = 0;
            // 'or' by 1 to make loop body slightly larger to work around
            // https://github.com/dotnet/runtime/issues/104665
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsWithSpan(ReadOnlySpan<short> input)
        {
            int result = 0;
            foreach (short s in input)
                result += s | 1;
            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsStrengthReducedWithArray(short[] input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                ref short p = ref input[0];
                do
                {
                    result += p | 1;
                    p = ref Unsafe.Add(ref p, 1);
                    length--;
                } while (length != 0);
            }

            return result;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private int SumShortsStrengthReducedWithSpan(ReadOnlySpan<short> input)
        {
            int result = 0;
            uint length = (uint)input.Length;
            if (length > 0)
            {
                nuint offset = 0;
                ref short p = ref MemoryMarshal.GetReference(input);
                do
                {
                    result += Unsafe.AddByteOffset(ref p, offset) | 1;
                    offset += 2;
                    length--;
                } while (length != 0);
            }

            return result;
        }
    }
}

EgorBot commented 4 months ago

Benchmark results on Intel

``` BenchmarkDotNet v0.13.12, Ubuntu 22.04.4 LTS (Jammy Jellyfish) Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 8 logical and 4 physical cores Job-PVDUFK : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI Job-JNSHXN : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI ``` | Method | Toolchain | Mean | Error | Ratio | Code Size | |------------------------------ |------------------------ |---------:|----------:|------:|----------:| | SumShortsArray | Main | 5.082 μs | 0.0019 μs | 1.00 | 44 B | | SumShortsSpan | Main | 5.093 μs | 0.0010 μs | 1.00 | 72 B | | SumShortsArrayStrengthReduced | Main | 4.492 μs | 0.0005 μs | 0.88 | 61 B | | SumShortsSpanStrengthReduced | Main | 4.573 μs | 0.0338 μs | 0.90 | 74 B | | SumShortsArray | PR | 5.099 μs | 0.0045 μs | 1.00 | 44 B | | SumShortsSpan | PR | 5.113 μs | 0.0078 μs | 1.01 | 72 B | | SumShortsArrayStrengthReduced | PR | 4.492 μs | 0.0007 μs | 0.88 | 61 B | | SumShortsSpanStrengthReduced | PR | 4.495 μs | 0.0041 μs | 0.88 | 74 B | [BDN_Artifacts.zip](https://telegafiles.blob.core.windows.net/telega/BDN_Artifacts_04e8f73c.zip)