dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.93k stars 1.86k forks source link

Improve performance of DataFrame binary comparison operations #6869

Closed asmirnov82 closed 8 months ago

asmirnov82 commented 8 months ago

Goal of this PR is to improve performance of comparison operation and is the next step in aligning datafrane arithmetic API with the new TensorPrimitives API

In DataFrame 0.20.1:

Method Mean Error StdDev
ElementwiseEquals_Int32_Int32 38.00 ms 0.145 ms 0.121 ms
ElementwiseEquals_Int16_Int16 39.55 ms 0.291 ms 0.258 ms
ElementwiseEquals_Double_Double 40.28 ms 0.367 ms 0.343 ms
ElementwiseEquals_Float_Float 41.18 ms 0.805 ms 1.074 ms

After this PR:

Method Mean Error StdDev
ElementwiseEquals_Int32_Int32 1.171 ms 0.0228 ms 0.0263 ms
ElementwiseEquals_Int16_Int16 1.090 ms 0.0569 ms 0.0475 ms
ElementwiseEquals_Double_Double 1.388 ms 0.0264 ms 0.0247 ms
ElementwiseEquals_Float_Float 1.250 ms 0.0215 ms 0.0190 ms ``

Other comparison operations shows the same boost in performance

asmirnov82 commented 8 months ago

@JakeRadMSFT could you please review?

codecov[bot] commented 8 months ago

Codecov Report

Merging #6869 (12a296a) into main (766569b) will decrease coverage by 0.02%. Report is 2 commits behind head on main. The diff coverage is 62.00%.

@@            Coverage Diff             @@
##             main    #6869      +/-   ##
==========================================
- Coverage   69.40%   69.39%   -0.02%     
==========================================
  Files        1238     1238              
  Lines      249441   249462      +21     
  Branches    25522    25522              
==========================================
- Hits       173130   173113      -17     
+ Misses      69692    69599      -93     
- Partials     6619     6750     +131     
Flag Coverage Δ
Debug 69.39% <62.00%> (-0.02%) :arrow_down:
production 63.91% <62.00%> (-0.02%) :arrow_down:
test 88.90% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/Microsoft.Data.Analysis/BitUtility.cs 84.78% <100.00%> (ø)
...rosoft.Data.Analysis/ArrowStringDataFrameColumn.cs 63.54% <0.00%> (ø)
src/Microsoft.Data.Analysis/DataFrameBuffer.cs 84.90% <50.00%> (ø)
...icrosoft.Data.Analysis/PrimitiveColumnContainer.cs 86.08% <92.30%> (ø)
...c/Microsoft.Data.Analysis/StringDataFrameColumn.cs 71.42% <0.00%> (ø)
...icrosoft.Data.Analysis/PrimitiveDataFrameColumn.cs 73.19% <66.66%> (ø)
src/Microsoft.Data.Analysis/Strings.Designer.cs 42.36% <0.00%> (ø)
.../Microsoft.Data.Analysis/VBufferDataFrameColumn.cs 45.77% <56.25%> (-1.41%) :arrow_down:
...Microsoft.Data.Analysis/Computations/Arithmetic.cs 62.06% <50.00%> (ø)
...lysis/PrimitiveColumnContainer.BinaryOperations.cs 89.83% <57.14%> (-10.17%) :arrow_down:
... and 1 more

... and 8 files with indirect coverage changes