Changes some of the CPU Math implemenation from our current version to use the new TensorPrimitives package.

michaelgsharp commented 11 months ago

This changes some of the CPU Math implementation from our current version to use the new TensorPrimitives package.

Currently we are pointing to the rc2 version, but the following benchmarks have been done with a local copy of the GA version.

This also changes CPUMath to target .NET 8 instead of .NET 6. Did we want that for this version? Or should I change it back to 6 for this release? @ericstj @jeffhandley

The following is a summary of the methods in CPUMath, the old vs new benchmarks, and whether I updated it to use the new TensorPrimitives package. @tannergooding @stephentoub @jeffhandley @ericstj @luisquintanilla This is where we need to discuss. Is any performance hit worth taking? Or should anything that is slower be kept on the existing code?

NET 8

Method	arrayLength	Mean - Original	Mean - New	% Faster	Comments
AddScalarU	512	25.30 ns	20.32 ns	25%
Scale	512	19.91 ns	19.29 ns	3%
ScaleSrcU	512	27.58 ns	20.74 ns	33%
ScaleAddU	512	28.46 ns	29.05 ns		Method Unchanged, composite function so slower with new code
AddScaleU	512	29.74 ns	28.59 ns	4%
AddScaleSU	512	345.92 ns	327.68 ns	6%	Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
AddScaleCopyU	512	34.01 ns	27.03 ns	26%
AddU	512	29.80 ns	26.71 ns	12%
AddSU	512	325.32 ns	349.46 ns		Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
MulElementWiseU	512	33.92 ns	27.29 ns	24%
Sum	512	36.57 ns	34.34 ns	6%
SumSqU	512	37.50 ns	39.34 ns	-5%
SumSqDiffU	512	41.23 ns	43.38 ns		Method Unchanged, composite function so slower with new code
SumAbsU	512	43.74 ns	39.27 ns	11%
SumAbsDiffU	512	47.23 ns	37.48 ns	26%
MaxAbsU	512	42.30 ns	43.26 ns		Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release
MaxAbsDiffU	512	46.94 ns	47.73 ns		Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release. Is composite function.
DotU	512	50.34 ns	43.20 ns	17%
DotSU	512	212.19 ns	213.18 ns		Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
Dist2	512	55.48 ns	47.43 ns	17%

Framework

Method	arrayLength	Mean - Original	Mean - New	% Faster	Comments
AddScalarU	256	48.48 ns	29.88 ns	62%
Scale	256	43.45 ns	28.55 ns	52%
ScaleSrcU	256	49.87 ns	38.13 ns	31%
ScaleAddU	256	47.87 ns	45.76 ns		Method Unchanged, composite function so slower with new code
AddScaleU	256	52.63 ns	62.58 ns	-16%	Slightly slower in new code. Do we want to keep it?
AddScaleSU	256	151.00 ns	152.77 ns		Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
AddScaleCopyU	256	48.35 ns	63.94 ns	-24%	Slightly slower in new code. Do we want to keep it?
AddU	256	49.68 ns	59.32 ns	-16%	Slightly slower in new code. Do we want to keep it?
AddSU	256	150.34 ns	153.89 ns		Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
MulElementWiseU	256	48.26 ns	69.89 ns	-31%
Sum	256	68.05 ns	59.74 ns	14%
SumSqU	256	68.21 ns	62.08 ns	10%
SumSqDiffU	256	57.52 ns	57.64 ns		Method Unchanged, composite function so slower with new code
SumAbsU	256	72.88 ns	65.01 ns	12%
SumAbsDiffU	256	59.51 ns	68.23 ns	-13%	Slightly slower in new code. Do we want to keep it?
MaxAbsU	256	72.26 ns	71.48 ns		Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release
MaxAbsDiffU	256	59.30 ns	58.87 ns		Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release. Is composite function.
DotU	256	58.93 ns	68.42 ns	-14%	Slightly slower in new code. Do we want to keep it?
DotSU	256	109.76 ns	113.78 ns		Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
Dist2	256	59.49 ns	86.97 ns	-32%	Slightly slower in new code. Do we want to keep it?

I think that even if we don't want to keep the TensorPrimitives code in the cases where its slower, at least for .NET Framework we should add a check and if the native code doesn't exist to run these accelerated, we should fallback to the TensorPrimitives approach. That would have to be added in though.

All this was done with AVX256.

codecov[bot] commented 10 months ago

Codecov Report

Merging #6875 (54e876a) into main (796cb35) will decrease coverage by 0.60%. Report is 1 commits behind head on main. The diff coverage is 100.00%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #6875 +/- ## ========================================== - Coverage 69.40% 68.80% -0.60% ========================================== Files 1238 1240 +2 Lines 249462 249392 -70 Branches 25522 25493 -29 ========================================== - Hits 173139 171599 -1540 - Misses 69578 71196 +1618 + Partials 6745 6597 -148 ``` | [Flag](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | Coverage Δ | | |---|---|---| | [Debug](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `68.80% <100.00%> (-0.60%)` | :arrow_down: | | [production](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `63.26% <100.00%> (-0.67%)` | :arrow_down: | | [test](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `88.49% <ø> (-0.41%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | Coverage Δ | | |---|---|---| | [src/Microsoft.ML.CpuMath/AvxIntrinsics.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5DcHVNYXRoL0F2eEludHJpbnNpY3MuY3M=) | `58.18% <ø> (-38.51%)` | :arrow_down: | | [src/Microsoft.ML.CpuMath/CpuMathUtils.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5DcHVNYXRoL0NwdU1hdGhVdGlscy5jcw==) | `100.00% <100.00%> (ø)` | | | [...rc/Microsoft.ML.CpuMath/CpuMathUtils.netcoreapp.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5DcHVNYXRoL0NwdU1hdGhVdGlscy5uZXRjb3JlYXBwLmNz) | `97.80% <100.00%> (-0.84%)` | :arrow_down: | | [src/Microsoft.ML.CpuMath/SseIntrinsics.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5DcHVNYXRoL1NzZUludHJpbnNpY3MuY3M=) | `54.80% <ø> (-41.55%)` | :arrow_down: | ... and [48 files with indirect coverage changes](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet)

michaelgsharp commented 10 months ago

/azp run

azure-pipelines[bot] commented 10 months ago

Azure Pipelines successfully started running 2 pipeline(s).

dotnet / machinelearning

Changes some of the CPU Math implemenation from our current version to use the new TensorPrimitives package. #6875

Codecov Report