dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

Increase performance of arithmetic operations by enhancing calculations on nullable values #6846

Closed asmirnov82 closed 1 year ago

asmirnov82 commented 1 year ago

Fixes #6825

Additionaly reduces amount of autogenerated code and tt templates complexity

This PR also provides ability for future vectorization of Arithmetic operations - #5695

Notes:

1) Speed improvement is achived by eliminating checks for validity on every operation. It's faster to calculate raw values and them to calculate validity for all values, than to skip calculation in case of one of the value is null (both operations are easy to vectorised in future). Exception is division (row value for null can be zero - have to avoid this operation)

2) A lot of autogenerated lines of code are removed:

a) by moving cycle on buffers from PrimitiveDataFrameColumnArithmetic to PrimitiveColumnContainer level. This code is generic, so it's not necessary to duplicate it for every type alias. Internal cycle on each element inside span still stays on PrimitiveDataFrameColumnArithmetic to avoid using virtual method inside the cycle (in theory calling virtual method is slightly slower, perhaps the overall difference is negligibly small). Exceptions are Comparison and ComparisonScalar operators - need to be fixed later, reason for not doing it now: max size of Boolean Buffer and T Buffer are different, that's why currently PrimitiveColumnContainer result is used on PrimitiveDataFrameColumnArithmetic instead of Span, this can be solved by using Memory instead of Memory in DataFrameBuffer - will also increase performance of indexing (necessity of this change was already mentioned (https://github.com/dotnet/corefxlab/pull/2656/files#diff-d3a1807515a8785fb351fb67ecdae517d34c30f7d974b4631fcf65f0945510aaR21) - needs further investigation
b) by using one method to perform operation - HandleOperation with operation type enum as parameter instead of several methods. Still there are a lot of lines of code for each method in BinaryOperationAPIs.ExplodedColumns and PrimitiveDataFrameColumn.BinaryOperationImplementations.Exploded - should be remediated with implementation of #5665 (improving the binary operations behavior when types mismatch). TODO; BinaryOperation and BinaryScalarOperation enums should be combined into one enum, the same way as ComparisonOperation and ComparisonScalarOperation. TT templates and configuration additionaly simplified

codecov[bot] commented 1 year ago

Codecov Report

Merging #6846 (5440b74) into main (7fe293d) will increase coverage by 0.60%. The diff coverage is 72.98%.

@@            Coverage Diff             @@
##             main    #6846      +/-   ##
==========================================
+ Coverage   69.03%   69.64%   +0.60%     
==========================================
  Files        1237     1237              
  Lines      253612   247580    -6032     
  Branches    26571    25435    -1136     
==========================================
- Hits       175092   172416    -2676     
+ Misses      71567    68555    -3012     
+ Partials     6953     6609     -344     
Flag Coverage Δ
Debug 69.64% <72.98%> (+0.60%) :arrow_up:
production 64.19% <72.29%> (+0.60%) :arrow_up:
test 88.88% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/Microsoft.Data.Analysis/DataFrameBuffer.cs 84.90% <ø> (-0.28%) :arrow_down:
src/Microsoft.Data.Analysis/DateTimeComputation.cs 39.16% <100.00%> (ø)
...lysis/PrimitiveColumnContainer.BinaryOperations.cs 100.00% <100.00%> (+17.14%) :arrow_up:
...icrosoft.Data.Analysis/PrimitiveColumnContainer.cs 86.08% <100.00%> (+1.65%) :arrow_up:
...FrameColumn.BinaryOperationAPIs.ExplodedColumns.cs 7.69% <ø> (+0.09%) :arrow_up:
...eColumn.BinaryOperationImplementations.Exploded.cs 76.47% <ø> (+24.19%) :arrow_up:
...lysis/PrimitiveDataFrameColumn.BinaryOperations.cs 57.06% <ø> (+14.56%) :arrow_up:
...oft.Data.Analysis/PrimitiveDataFrameColumn.Sort.cs 87.34% <100.00%> (ø)
...ata.Analysis/PrimitiveDataFrameColumnArithmetic.cs 53.13% <ø> (+2.06%) :arrow_up:
...a.Analysis/PrimitiveDataFrameColumnComputations.cs 50.37% <100.00%> (ø)
... and 4 more

... and 12 files with indirect coverage changes

asmirnov82 commented 1 year ago

@JakeRadMSFT