dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.94k stars 1.86k forks source link

PrimitiveDataFrameColumn.Clone method crashes when is used with IEnumerable mapIndices argument #6822

Closed asmirnov82 closed 9 months ago

asmirnov82 commented 10 months ago

In frame of this PR:

1) Cover PrimitiveDataFrameColumn.Clone method with additional unit tests 2) Use Span.Fill method to init NullValidityBuffer instead of setting individual bits in a cycle inside AppendMany (value, count) method

`lastNullBitMapBuffer.RawSpan.Slice(lastNullBitMapBuffer.Length - nullBufferAllocatable, BitmapHelper.SetBits(lastNullBitMapBuffer.RawSpan, originalBufferLength, allocatable, true);

3) Fixes #6821

Root cause: using of ret[curRow] = value; for empty column. changed to ret.Append(value);

Additionaly fix returning columns with parent column type for inheritors of PrimitiveDataFrame in Clone method

4) Refactoring: Fixing 3 and 2 allowed to get rid of _internal bool modifyNullCountWhileIndexing field.

codecov[bot] commented 10 months ago

Codecov Report

Merging #6822 (9654516) into main (85ee6e5) will increase coverage by 0.00%. The diff coverage is 90.82%.

@@           Coverage Diff            @@
##             main    #6822    +/-   ##
========================================
  Coverage   69.05%   69.05%            
========================================
  Files        1237     1237            
  Lines      253422   253600   +178     
  Branches    26546    26568    +22     
========================================
+ Hits       175002   175132   +130     
- Misses      71467    71521    +54     
+ Partials     6953     6947     -6     
Flag Coverage Δ
Debug 69.05% <90.82%> (+<0.01%) :arrow_up:
production 63.61% <73.97%> (-0.02%) :arrow_down:
test 88.88% <99.31%> (+0.02%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...icrosoft.Data.Analysis/PrimitiveDataFrameColumn.cs 84.27% <100.00%> (+2.87%) :arrow_up:
test/Microsoft.Data.Analysis.Tests/BufferTests.cs 99.64% <99.31%> (-0.36%) :arrow_down:
...icrosoft.Data.Analysis/PrimitiveColumnContainer.cs 84.43% <71.21%> (-1.70%) :arrow_down:

... and 7 files with indirect coverage changes

asmirnov82 commented 10 months ago

@JakeRadMSFT could you please take a look?