Closed gfoidl closed 6 years ago
The following illustration shows the utilization patterns for
It can be clearly seen, that the Pfx default and static partitioner don't fit to the problem domain.
BenchmarkDotNet=v0.10.11, OS=ubuntu 16.04
Processor=Intel Xeon CPU 2.60GHz, ProcessorCount=4
.NET Core SDK=2.1.4
[Host] : .NET Core 2.0.5 (Framework 4.6.0.0), 64bit RyuJIT
DefaultJob : .NET Core 2.0.5 (Framework 4.6.0.0), 64bit RyuJIT
Method | N | PartitionMultiplier | Mean | Error | StdDev | Scaled | ScaledSD |
---|---|---|---|---|---|---|---|
TrapezeWorkload | 1000 | 1 | 234.2 us | 4.450 us | 4.371 us | 1.00 | 0.00 |
CustomLoop | 1000 | 1 | 219.8 us | 2.722 us | 2.546 us | 0.94 | 0.02 |
TrapezeWorkload | 1000 | 2 | 234.1 us | 3.264 us | 3.053 us | 1.00 | 0.00 |
CustomLoop | 1000 | 2 | 217.0 us | 2.339 us | 2.188 us | 0.93 | 0.01 |
TrapezeWorkload | 1000 | 3 | 232.3 us | 1.523 us | 1.425 us | 1.00 | 0.00 |
CustomLoop | 1000 | 3 | 218.3 us | 2.659 us | 2.488 us | 0.94 | 0.01 |
TrapezeWorkload | 1000 | 4 | 236.0 us | 1.843 us | 1.634 us | 1.00 | 0.00 |
CustomLoop | 1000 | 4 | 220.6 us | 1.595 us | 1.492 us | 0.93 | 0.01 |
TrapezeWorkload | 1000 | 8 | 229.6 us | 2.151 us | 2.012 us | 1.00 | 0.00 |
CustomLoop | 1000 | 8 | 219.2 us | 1.758 us | 1.644 us | 0.95 | 0.01 |
TrapezeWorkload | 2000 | 1 | 804.3 us | 7.051 us | 6.596 us | 1.00 | 0.00 |
CustomLoop | 2000 | 1 | 785.5 us | 9.119 us | 8.530 us | 0.98 | 0.01 |
TrapezeWorkload | 2000 | 2 | 801.7 us | 4.145 us | 3.675 us | 1.00 | 0.00 |
CustomLoop | 2000 | 2 | 785.7 us | 5.515 us | 5.159 us | 0.98 | 0.01 |
TrapezeWorkload | 2000 | 3 | 803.0 us | 8.320 us | 7.782 us | 1.00 | 0.00 |
CustomLoop | 2000 | 3 | 784.3 us | 7.768 us | 7.266 us | 0.98 | 0.01 |
TrapezeWorkload | 2000 | 4 | 802.4 us | 6.811 us | 6.371 us | 1.00 | 0.00 |
CustomLoop | 2000 | 4 | 782.5 us | 5.974 us | 5.296 us | 0.98 | 0.01 |
TrapezeWorkload | 2000 | 8 | 800.4 us | 7.422 us | 6.943 us | 1.00 | 0.00 |
CustomLoop | 2000 | 8 | 781.0 us | 4.718 us | 4.413 us | 0.98 | 0.01 |
TrapezeWorkload | 5000 | 1 | 4,664.5 us | 67.967 us | 60.251 us | 1.00 | 0.00 |
CustomLoop | 5000 | 1 | 4,767.7 us | 109.661 us | 102.577 us | 1.02 | 0.02 |
TrapezeWorkload | 5000 | 2 | 4,630.1 us | 31.805 us | 29.751 us | 1.00 | 0.00 |
CustomLoop | 5000 | 2 | 4,681.8 us | 41.649 us | 36.921 us | 1.01 | 0.01 |
TrapezeWorkload | 5000 | 3 | 4,657.0 us | 40.327 us | 37.722 us | 1.00 | 0.00 |
CustomLoop | 5000 | 3 | 4,636.1 us | 46.268 us | 43.279 us | 1.00 | 0.01 |
TrapezeWorkload | 5000 | 4 | 4,655.6 us | 44.765 us | 41.873 us | 1.00 | 0.00 |
CustomLoop | 5000 | 4 | 4,663.1 us | 36.008 us | 33.682 us | 1.00 | 0.01 |
TrapezeWorkload | 5000 | 8 | 4,633.2 us | 22.115 us | 20.686 us | 1.00 | 0.00 |
CustomLoop | 5000 | 8 | 4,656.5 us | 37.476 us | 35.055 us | 1.01 | 0.01 |
The custom loop displays a better utilization pattern, and the runtime for smaller arrays is better than the one for the default parallel foreach-loop. For larger sizes the custom loop doesn't show any benefit in timing.
The custom loop has no mean of cooperative multitasking. Thus on the benchmark it might be better, but in real-world application this can introduce a negative effect.
A custom partitioner is used to address the workload-shape for autocorrelation.
Fixes https://github.com/gfoidl/Stochastics/issues/9