dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

Changes some of the CPU Math implemenation from our current version to use the new TensorPrimitives package. #6875

Closed michaelgsharp closed 10 months ago

michaelgsharp commented 11 months ago

This changes some of the CPU Math implementation from our current version to use the new TensorPrimitives package.

Currently we are pointing to the rc2 version, but the following benchmarks have been done with a local copy of the GA version.

This also changes CPUMath to target .NET 8 instead of .NET 6. Did we want that for this version? Or should I change it back to 6 for this release? @ericstj @jeffhandley

The following is a summary of the methods in CPUMath, the old vs new benchmarks, and whether I updated it to use the new TensorPrimitives package. @tannergooding @stephentoub @jeffhandley @ericstj @luisquintanilla This is where we need to discuss. Is any performance hit worth taking? Or should anything that is slower be kept on the existing code?

NET 8

Method arrayLength Mean - Original Mean - New % Faster Comments
AddScalarU 512 25.30 ns 20.32 ns 25%
Scale 512 19.91 ns 19.29 ns 3%
ScaleSrcU 512 27.58 ns 20.74 ns 33%
ScaleAddU 512 28.46 ns 29.05 ns Method Unchanged, composite function so slower with new code
AddScaleU 512 29.74 ns 28.59 ns 4%
AddScaleSU 512 345.92 ns 327.68 ns 6% Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
AddScaleCopyU 512 34.01 ns 27.03 ns 26%
AddU 512 29.80 ns 26.71 ns 12%
AddSU 512 325.32 ns 349.46 ns Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
MulElementWiseU 512 33.92 ns 27.29 ns 24%
Sum 512 36.57 ns 34.34 ns 6%
SumSqU 512 37.50 ns 39.34 ns -5%
SumSqDiffU 512 41.23 ns 43.38 ns Method Unchanged, composite function so slower with new code
SumAbsU 512 43.74 ns 39.27 ns 11%
SumAbsDiffU 512 47.23 ns 37.48 ns 26%
MaxAbsU 512 42.30 ns 43.26 ns Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release
MaxAbsDiffU 512 46.94 ns 47.73 ns Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release. Is composite function.
DotU 512 50.34 ns 43.20 ns 17%
DotSU 512 212.19 ns 213.18 ns Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
Dist2 512 55.48 ns 47.43 ns 17%

Framework

Method arrayLength Mean - Original Mean - New % Faster Comments
AddScalarU 256 48.48 ns 29.88 ns 62%
Scale 256 43.45 ns 28.55 ns 52%
ScaleSrcU 256 49.87 ns 38.13 ns 31%
ScaleAddU 256 47.87 ns 45.76 ns Method Unchanged, composite function so slower with new code
AddScaleU 256 52.63 ns 62.58 ns -16% Slightly slower in new code. Do we want to keep it?
AddScaleSU 256 151.00 ns 152.77 ns Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
AddScaleCopyU 256 48.35 ns 63.94 ns -24% Slightly slower in new code. Do we want to keep it?
AddU 256 49.68 ns 59.32 ns -16% Slightly slower in new code. Do we want to keep it?
AddSU 256 150.34 ns 153.89 ns Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
MulElementWiseU 256 48.26 ns 69.89 ns -31%
Sum 256 68.05 ns 59.74 ns 14%
SumSqU 256 68.21 ns 62.08 ns 10%
SumSqDiffU 256 57.52 ns 57.64 ns Method Unchanged, composite function so slower with new code
SumAbsU 256 72.88 ns 65.01 ns 12%
SumAbsDiffU 256 59.51 ns 68.23 ns -13% Slightly slower in new code. Do we want to keep it?
MaxAbsU 256 72.26 ns 71.48 ns Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release
MaxAbsDiffU 256 59.30 ns 58.87 ns Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release. Is composite function.
DotU 256 58.93 ns 68.42 ns -14% Slightly slower in new code. Do we want to keep it?
DotSU 256 109.76 ns 113.78 ns Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
Dist2 256 59.49 ns 86.97 ns -32% Slightly slower in new code. Do we want to keep it?

I think that even if we don't want to keep the TensorPrimitives code in the cases where its slower, at least for .NET Framework we should add a check and if the native code doesn't exist to run these accelerated, we should fallback to the TensorPrimitives approach. That would have to be added in though.

All this was done with AVX256.

codecov[bot] commented 10 months ago

Codecov Report

Merging #6875 (54e876a) into main (796cb35) will decrease coverage by 0.60%. Report is 1 commits behind head on main. The diff coverage is 100.00%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #6875 +/- ## ========================================== - Coverage 69.40% 68.80% -0.60% ========================================== Files 1238 1240 +2 Lines 249462 249392 -70 Branches 25522 25493 -29 ========================================== - Hits 173139 171599 -1540 - Misses 69578 71196 +1618 + Partials 6745 6597 -148 ``` | [Flag](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | Coverage Δ | | |---|---|---| | [Debug](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `68.80% <100.00%> (-0.60%)` | :arrow_down: | | [production](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `63.26% <100.00%> (-0.67%)` | :arrow_down: | | [test](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `88.49% <ø> (-0.41%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | Coverage Δ | | |---|---|---| | [src/Microsoft.ML.CpuMath/AvxIntrinsics.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5DcHVNYXRoL0F2eEludHJpbnNpY3MuY3M=) | `58.18% <ø> (-38.51%)` | :arrow_down: | | [src/Microsoft.ML.CpuMath/CpuMathUtils.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5DcHVNYXRoL0NwdU1hdGhVdGlscy5jcw==) | `100.00% <100.00%> (ø)` | | | [...rc/Microsoft.ML.CpuMath/CpuMathUtils.netcoreapp.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5DcHVNYXRoL0NwdU1hdGhVdGlscy5uZXRjb3JlYXBwLmNz) | `97.80% <100.00%> (-0.84%)` | :arrow_down: | | [src/Microsoft.ML.CpuMath/SseIntrinsics.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5DcHVNYXRoL1NzZUludHJpbnNpY3MuY3M=) | `54.80% <ø> (-41.55%)` | :arrow_down: | ... and [48 files with indirect coverage changes](https://app.codecov.io/gh/dotnet/machinelearning/pull/6875/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet)
michaelgsharp commented 10 months ago

/azp run

azure-pipelines[bot] commented 10 months ago
Azure Pipelines successfully started running 2 pipeline(s).