Closed asmirnov82 closed 1 year ago
Merging #6846 (5440b74) into main (7fe293d) will increase coverage by
0.60%
. The diff coverage is72.98%
.
@@ Coverage Diff @@
## main #6846 +/- ##
==========================================
+ Coverage 69.03% 69.64% +0.60%
==========================================
Files 1237 1237
Lines 253612 247580 -6032
Branches 26571 25435 -1136
==========================================
- Hits 175092 172416 -2676
+ Misses 71567 68555 -3012
+ Partials 6953 6609 -344
Flag | Coverage Δ | |
---|---|---|
Debug | 69.64% <72.98%> (+0.60%) |
:arrow_up: |
production | 64.19% <72.29%> (+0.60%) |
:arrow_up: |
test | 88.88% <100.00%> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
Files | Coverage Δ | |
---|---|---|
src/Microsoft.Data.Analysis/DataFrameBuffer.cs | 84.90% <ø> (-0.28%) |
:arrow_down: |
src/Microsoft.Data.Analysis/DateTimeComputation.cs | 39.16% <100.00%> (ø) |
|
...lysis/PrimitiveColumnContainer.BinaryOperations.cs | 100.00% <100.00%> (+17.14%) |
:arrow_up: |
...icrosoft.Data.Analysis/PrimitiveColumnContainer.cs | 86.08% <100.00%> (+1.65%) |
:arrow_up: |
...FrameColumn.BinaryOperationAPIs.ExplodedColumns.cs | 7.69% <ø> (+0.09%) |
:arrow_up: |
...eColumn.BinaryOperationImplementations.Exploded.cs | 76.47% <ø> (+24.19%) |
:arrow_up: |
...lysis/PrimitiveDataFrameColumn.BinaryOperations.cs | 57.06% <ø> (+14.56%) |
:arrow_up: |
...oft.Data.Analysis/PrimitiveDataFrameColumn.Sort.cs | 87.34% <100.00%> (ø) |
|
...ata.Analysis/PrimitiveDataFrameColumnArithmetic.cs | 53.13% <ø> (+2.06%) |
:arrow_up: |
...a.Analysis/PrimitiveDataFrameColumnComputations.cs | 50.37% <100.00%> (ø) |
|
... and 4 more |
@JakeRadMSFT
Fixes #6825
Additionaly reduces amount of autogenerated code and tt templates complexity
This PR also provides ability for future vectorization of Arithmetic operations - #5695
Notes:
1) Speed improvement is achived by eliminating checks for validity on every operation. It's faster to calculate raw values and them to calculate validity for all values, than to skip calculation in case of one of the value is null (both operations are easy to vectorised in future). Exception is division (row value for null can be zero - have to avoid this operation)
2) A lot of autogenerated lines of code are removed:
a) by moving cycle on buffers from PrimitiveDataFrameColumnArithmetic to PrimitiveColumnContainer level. This code is generic, so it's not necessary to duplicate it for every type alias. Internal cycle on each element inside span still stays on PrimitiveDataFrameColumnArithmetic to avoid using virtual method inside the cycle (in theory calling virtual method is slightly slower, perhaps the overall difference is negligibly small). Exceptions are Comparison and ComparisonScalar operators - need to be fixed later, reason for not doing it now: max size of Boolean Buffer and T Buffer are different, that's why currently PrimitiveColumnContainer result is used on PrimitiveDataFrameColumnArithmetic instead of Span, this can be solved by using Memory instead of Memory in DataFrameBuffer - will also increase performance of indexing (necessity of this change was already mentioned (https://github.com/dotnet/corefxlab/pull/2656/files#diff-d3a1807515a8785fb351fb67ecdae517d34c30f7d974b4631fcf65f0945510aaR21) - needs further investigation
b) by using one method to perform operation - HandleOperation with operation type enum as parameter instead of several methods. Still there are a lot of lines of code for each method in BinaryOperationAPIs.ExplodedColumns and PrimitiveDataFrameColumn.BinaryOperationImplementations.Exploded - should be remediated with implementation of #5665 (improving the binary operations behavior when types mismatch).
TODO; BinaryOperation and BinaryScalarOperation enums should be combined into one enum, the same way as ComparisonOperation and ComparisonScalarOperation. TT templates and configuration additionaly simplified