Closed asmirnov82 closed 9 months ago
Merging #6814 (1eb8a35) into main (d692751) will increase coverage by
0.01%
. Report is 2 commits behind head on main. The diff coverage is57.29%
.
@@ Coverage Diff @@
## main #6814 +/- ##
==========================================
+ Coverage 68.99% 69.01% +0.01%
==========================================
Files 1237 1237
Lines 253558 253556 -2
Branches 26542 26540 -2
==========================================
+ Hits 174944 174984 +40
+ Misses 71663 71616 -47
- Partials 6951 6956 +5
Flag | Coverage Δ | |
---|---|---|
Debug | 69.01% <57.29%> (+0.01%) |
:arrow_up: |
production | 63.57% <48.75%> (+0.01%) |
:arrow_up: |
test | 88.86% <100.00%> (+<0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Files | Coverage Δ | |
---|---|---|
...rosoft.Data.Analysis/ArrowStringDataFrameColumn.cs | 63.54% <100.00%> (ø) |
|
...lysis/PrimitiveDataFrameColumn.BinaryOperations.cs | 42.50% <100.00%> (+0.50%) |
:arrow_up: |
test/Microsoft.Data.Analysis.Tests/BufferTests.cs | 100.00% <100.00%> (ø) |
|
...st/Microsoft.Data.Analysis.Tests/DataFrameTests.cs | 99.31% <100.00%> (+<0.01%) |
:arrow_up: |
...est/Microsoft.ML.Fairlearn.Tests/GridSearchTest.cs | 100.00% <100.00%> (ø) |
|
...icrosoft.Data.Analysis/PrimitiveColumnContainer.cs | 86.13% <94.73%> (-0.41%) |
:arrow_down: |
...Microsoft.Data.Analysis/ReadOnlyDataFrameBuffer.cs | 48.71% <66.66%> (+1.34%) |
:arrow_up: |
src/Microsoft.Data.Analysis/DataFrameBuffer.cs | 85.18% <80.00%> (+2.20%) |
:arrow_up: |
...eColumn.BinaryOperationImplementations.Exploded.cs | 52.27% <0.00%> (ø) |
Using Benchmarks from #6826
@JakeRadMSFT could you please take a look
The goal of this PR is to perform Arithmetics operation on columns with the same underlying data type approximately 3 times faster.
Detail of changes:
1) Fix PrimitiveColumnContainer Clone() method to use memory block coping for internal buffer instead of appending values one by one (with memory reallocation on each buffer resizing cycle). Do similar changes for CloneNullBitMapBuffers() method
2) Improve BinaryOperation.Implementation methods for all Arithmetic operations that happen not in place (default behavior). Before the change autogenerated code looked like this:
After PR https://github.com/dotnet/machinelearning/pull/6677 CloneAsSingleColumn can be changed to just this.Clone(). This allow to avoid unnecessary type conversion, that happens inside CloneAs... method and use fast Clone() method with bulk memory copy for internal buffers. For example. for Single:
3) Fix DataFrameBuffer constructor. DataFrameBuffer overrides parent ReadOnlyDataFrameBuffer ReadOnlyBuffer to return own new field _memory instead of parent _readOnlyBuffer (after this parent _readonlybuffer is ignored and never used). However in constructor _memory is not created, instead base constructor is called to allocate _readonlybuffer (which is ignored). So after creating Capacity of such buffer is always 0 (ignoring the actual parameter passed to the constructor) and additional memory is allocated
4) After 3 is fixed, changed code to use DataFrameBuffer constructor with capacity instead of creating empty dataframe buffer and than reallocating memory by calling EnsureCapacity
Result:
Simple tests for 1 million of rows:
Part of #6824 issue