This changes some of the CPU Math implementation from our current version to use the new TensorPrimitives package.
Currently we are pointing to the rc2 version, but the following benchmarks have been done with a local copy of the GA version.
This also changes CPUMath to target .NET 8 instead of .NET 6. Did we want that for this version? Or should I change it back to 6 for this release? @ericstj @jeffhandley
The following is a summary of the methods in CPUMath, the old vs new benchmarks, and whether I updated it to use the new TensorPrimitives package. @tannergooding @stephentoub @jeffhandley @ericstj @luisquintanilla This is where we need to discuss. Is any performance hit worth taking? Or should anything that is slower be kept on the existing code?
NET 8
Method
arrayLength
Mean - Original
Mean - New
% Faster
Comments
AddScalarU
512
25.30 ns
20.32 ns
25%
Scale
512
19.91 ns
19.29 ns
3%
ScaleSrcU
512
27.58 ns
20.74 ns
33%
ScaleAddU
512
28.46 ns
29.05 ns
Method Unchanged, composite function so slower with new code
AddScaleU
512
29.74 ns
28.59 ns
4%
AddScaleSU
512
345.92 ns
327.68 ns
6%
Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
AddScaleCopyU
512
34.01 ns
27.03 ns
26%
AddU
512
29.80 ns
26.71 ns
12%
AddSU
512
325.32 ns
349.46 ns
Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
MulElementWiseU
512
33.92 ns
27.29 ns
24%
Sum
512
36.57 ns
34.34 ns
6%
SumSqU
512
37.50 ns
39.34 ns
-5%
SumSqDiffU
512
41.23 ns
43.38 ns
Method Unchanged, composite function so slower with new code
SumAbsU
512
43.74 ns
39.27 ns
11%
SumAbsDiffU
512
47.23 ns
37.48 ns
26%
MaxAbsU
512
42.30 ns
43.26 ns
Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release
MaxAbsDiffU
512
46.94 ns
47.73 ns
Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release. Is composite function.
DotU
512
50.34 ns
43.20 ns
17%
DotSU
512
212.19 ns
213.18 ns
Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
Dist2
512
55.48 ns
47.43 ns
17%
Framework
Method
arrayLength
Mean - Original
Mean - New
% Faster
Comments
AddScalarU
256
48.48 ns
29.88 ns
62%
Scale
256
43.45 ns
28.55 ns
52%
ScaleSrcU
256
49.87 ns
38.13 ns
31%
ScaleAddU
256
47.87 ns
45.76 ns
Method Unchanged, composite function so slower with new code
AddScaleU
256
52.63 ns
62.58 ns
-16%
Slightly slower in new code. Do we want to keep it?
AddScaleSU
256
151.00 ns
152.77 ns
Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
AddScaleCopyU
256
48.35 ns
63.94 ns
-24%
Slightly slower in new code. Do we want to keep it?
AddU
256
49.68 ns
59.32 ns
-16%
Slightly slower in new code. Do we want to keep it?
AddSU
256
150.34 ns
153.89 ns
Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
MulElementWiseU
256
48.26 ns
69.89 ns
-31%
Sum
256
68.05 ns
59.74 ns
14%
SumSqU
256
68.21 ns
62.08 ns
10%
SumSqDiffU
256
57.52 ns
57.64 ns
Method Unchanged, composite function so slower with new code
SumAbsU
256
72.88 ns
65.01 ns
12%
SumAbsDiffU
256
59.51 ns
68.23 ns
-13%
Slightly slower in new code. Do we want to keep it?
MaxAbsU
256
72.26 ns
71.48 ns
Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release
MaxAbsDiffU
256
59.30 ns
58.87 ns
Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release. Is composite function.
DotU
256
58.93 ns
68.42 ns
-14%
Slightly slower in new code. Do we want to keep it?
DotSU
256
109.76 ns
113.78 ns
Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
Dist2
256
59.49 ns
86.97 ns
-32%
Slightly slower in new code. Do we want to keep it?
I think that even if we don't want to keep the TensorPrimitives code in the cases where its slower, at least for .NET Framework we should add a check and if the native code doesn't exist to run these accelerated, we should fallback to the TensorPrimitives approach. That would have to be added in though.
This changes some of the CPU Math implementation from our current version to use the new TensorPrimitives package.
Currently we are pointing to the rc2 version, but the following benchmarks have been done with a local copy of the GA version.
This also changes CPUMath to target .NET 8 instead of .NET 6. Did we want that for this version? Or should I change it back to 6 for this release? @ericstj @jeffhandley
The following is a summary of the methods in CPUMath, the old vs new benchmarks, and whether I updated it to use the new TensorPrimitives package. @tannergooding @stephentoub @jeffhandley @ericstj @luisquintanilla This is where we need to discuss. Is any performance hit worth taking? Or should anything that is slower be kept on the existing code?
NET 8
Framework
I think that even if we don't want to keep the TensorPrimitives code in the cases where its slower, at least for .NET Framework we should add a check and if the native code doesn't exist to run these accelerated, we should fallback to the TensorPrimitives approach. That would have to be added in though.
All this was done with AVX256.