Closed gfoidl closed 6 years ago
With the new SIMD codegen a benchmark for Kurtosis would be interesting, because this does a lot of work in the SIMD registers.
BenchmarkDotNet=v0.10.11, OS=Windows 7 SP1 (6.1.7601.0)
Processor=Intel Core i7-3610QM CPU 2.30GHz (Ivy Bridge), ProcessorCount=8
Frequency=2241064 Hz, Resolution=446.2166 ns, Timer=TSC
.NET Core SDK=2.1.2
[Host] : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
DefaultJob : .NET Core 2.0.3 (Framework 4.6.25815.02), 64bit RyuJIT
Method | Mean | Error | StdDev | Scaled | ScaledSD |
---|---|---|---|---|---|
Sequential | 11.240 us | 0.2209 us | 0.3691 us | 1.00 | 0.00 |
UnsafeSimd | 5.369 us | 0.1042 us | 0.1391 us | 0.48 | 0.02 |
000007fe`7792d490 0f1019 movups xmm3,xmmword ptr [rcx] ; loop start
000007fe`7792d493 4883c110 add rcx,10h
000007fe`7792d497 660f5cd8 subpd xmm3,xmm0
000007fe`7792d49b 0f28e3 movaps xmm4,xmm3
000007fe`7792d49e 660f59e3 mulpd xmm4,xmm3
000007fe`7792d4a2 660f59e3 mulpd xmm4,xmm3
000007fe`7792d4a6 660f59e3 mulpd xmm4,xmm3
000007fe`7792d4aa 660f58d4 addpd xmm2,xmm4
000007fe`7792d4ae 0f1019 movups xmm3,xmmword ptr [rcx]
000007fe`7792d4b1 4883c110 add rcx,10h
000007fe`7792d4b5 660f5cd8 subpd xmm3,xmm0
000007fe`7792d4b9 0f28e3 movaps xmm4,xmm3
000007fe`7792d4bc 660f59e3 mulpd xmm4,xmm3
000007fe`7792d4c0 660f59e3 mulpd xmm4,xmm3
000007fe`7792d4c4 660f59e3 mulpd xmm4,xmm3
000007fe`7792d4c8 660f58d4 addpd xmm2,xmm4
000007fe`7792d4cc 4183c004 add r8d,4
000007fe`7792d4d0 453bc1 cmp r8d,r9d
000007fe`7792d4d3 7cbb jl 000007fe`7792d490 ; loop end
Pretty code 😄
The code for sequential and parallel is similar, except of the range. This can be refactored to cleaner code.
Fixes https://github.com/gfoidl/Stochastics/issues/4