dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.93k stars 1.86k forks source link

Fix assert by only accessing idx #6924

Closed ericstj closed 6 months ago

ericstj commented 6 months ago

Fixes #6921

Asserting on _rowCount < Utils.Size(_valueBoundaries) was catching a case where _rowCount's update was reordered before _valueBoundaries

This was unnecessary, since this method doesn't need to use _rowCount.

Instead, make the asserts use only idx which will be maintained consistent with the waiter logic in this cache. This will lock and synchronize with the main thread to ensure idx is always within bounds.

Ensure we only ever use _rowCount from the caching thread, so write reordering won't matter.

ericstj commented 6 months ago

I tested this fix in https://github.com/dotnet/machinelearning/pull/6922 as well

codecov[bot] commented 6 months ago

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (efab011) 68.79% compared to head (3ca267a) 68.79%. Report is 3 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #6924 +/- ## ======================================= Coverage 68.79% 68.79% ======================================= Files 1249 1249 Lines 249431 249433 +2 Branches 25510 25509 -1 ======================================= + Hits 171604 171607 +3 - Misses 71214 71216 +2 + Partials 6613 6610 -3 ``` | [Flag](https://app.codecov.io/gh/dotnet/machinelearning/pull/6924/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | Coverage Δ | | |---|---|---| | [Debug](https://app.codecov.io/gh/dotnet/machinelearning/pull/6924/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `68.79% <75.00%> (+<0.01%)` | :arrow_up: | | [production](https://app.codecov.io/gh/dotnet/machinelearning/pull/6924/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `63.25% <75.00%> (+<0.01%)` | :arrow_up: | | [test](https://app.codecov.io/gh/dotnet/machinelearning/pull/6924/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `88.49% <ø> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/dotnet/machinelearning/pull/6924?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | Coverage Δ | | |---|---|---| | [src/Microsoft.ML.Data/DataView/CacheDataView.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/6924?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5EYXRhL0RhdGFWaWV3L0NhY2hlRGF0YVZpZXcuY3M=) | `84.78% <75.00%> (+0.13%)` | :arrow_up: | ... and [3 files with indirect coverage changes](https://app.codecov.io/gh/dotnet/machinelearning/pull/6924/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet)
ericstj commented 6 months ago

This seems to have fixed things, but now our failures are all timeouts. Hitting here and in https://github.com/dotnet/machinelearning/pull/6923.

@michaelgsharp didn't you mention wanting to address the timeouts by splitting up the test assemblies?

michaelgsharp commented 6 months ago

@ericstj Yeah, as soon as I finish the NER issue I'll test splitting the assemblies, so I'll have a test out for that later today.