gfoidl / Stochastics

Stochastic tools, distrubution, analysis
MIT License
3 stars 0 forks source link

ValueTuples are slow -- replace with out parameters #48

Closed gfoidl closed 6 years ago

gfoidl commented 6 years ago

From this benchmark


BenchmarkDotNet=v0.10.13, OS=ubuntu 16.04
Intel Xeon CPU 2.60GHz, 1 CPU, 2 logical cores and 1 physical core
.NET Core SDK=2.1.300-preview3-008416
  [Host]     : .NET Core 2.1.0-preview2-26314-02 (CoreCLR 4.6.26310.01, CoreFX 4.6.26313.01), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.0-preview2-26314-02 (CoreCLR 4.6.26310.01, CoreFX 4.6.26313.01), 64bit RyuJIT
Method Mean Error StdDev Scaled
ValueTuple 10.878 ns 0.0188 ns 0.0176 ns 1.00
Out 2.387 ns 0.0192 ns 0.0180 ns 0.22

Here a quite a few places where valuetuples are used:

https://github.com/gfoidl/Stochastics/blob/bf00c42139cb3757eb9e06e1099e2400987aaf12/source/gfoidl.Stochastics/Statistics/Sample.MinMax.cs#L28

https://github.com/gfoidl/Stochastics/blob/bf00c42139cb3757eb9e06e1099e2400987aaf12/source/gfoidl.Stochastics/Statistics/OutlierDetection/ChauvenetOutlierDetection.cs#L85

and so on. They could profit from the change.

gfoidl commented 6 years ago

Take care to don't operate on the out-args, because this works against memory vs. working against registers.

Simple example:

using System.Runtime.CompilerServices;

namespace ConsoleApplication
{
    public class Program
    {
        public static int Main(string[] args)
        {
            int i = 42;
            A(ref i);
            B(ref i);

            return i > 0 ? 0 : 1;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private static void A(ref int a)
        {
            a++;
            a *= 2;
        }

        [MethodImpl(MethodImplOptions.NoInlining)]
        private static void B(ref int a)
        {
            int tmp = a;
            tmp++;
            tmp *= 2;

            a = tmp;
        }
    }
}
; Assembly listing for method Program:A(byref)
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  6,  6   )   byref  ->  rdi
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]
;
; Lcl frame size = 0

G_M7421_IG01:

G_M7421_IG02:
       FF07                 inc      dword ptr [rdi]        ; read and store to memory
       D127                 shl      dword ptr [rdi], 1     ; read and store to memory

G_M7421_IG03:
       C3                   ret

; Total bytes of code 5, prolog size 0 for method Program:A(byref)
; ============================================================
; Assembly listing for method Program:B(byref)
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 arg0         [V00,T01] (  4,  4   )   byref  ->  rdi
;  V01 loc0         [V01,T00] (  6,  6   )     int  ->  rax
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]
;
; Lcl frame size = 0

G_M7422_IG01:

G_M7422_IG02:
       8B07                 mov      eax, dword ptr [rdi]   ; read from memory
       FFC0                 inc      eax
       D1E0                 shl      eax, 1
       8907                 mov      dword ptr [rdi], eax   ; store to memory

G_M7422_IG03:
       C3                   ret

; Total bytes of code 9, prolog size 0 for method Program:B(byref)
; ============================================================