dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.88k forks source link

Address the feedback regarding Bert tokenizer #7280

Closed tarekgh closed 3 weeks ago

tarekgh commented 4 weeks ago

@stephentoub the change here is addressing the feedback you had in the other PR https://github.com/dotnet/machinelearning/pull/7275. Thanks!

codecov[bot] commented 4 weeks ago

Codecov Report

Attention: Patch coverage is 44.73684% with 63 lines in your changes missing coverage. Please review.

Project coverage is 68.87%. Comparing base (a7a6d88) to head (b7492da). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/Microsoft.ML.Tokenizers/Model/BertTokenizer.cs 43.68% 40 Missing and 18 partials :warning:
...crosoft.ML.Tokenizers/Normalizer/BertNormalizer.cs 66.66% 2 Missing and 1 partial :warning:
...icrosoft.ML.Tokenizers/Model/WordPieceTokenizer.cs 0.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #7280 +/- ## ========================================== - Coverage 68.89% 68.87% -0.02% ========================================== Files 1467 1467 Lines 273875 273955 +80 Branches 28363 28380 +17 ========================================== + Hits 188686 188697 +11 - Misses 77901 77947 +46 - Partials 7288 7311 +23 ``` | [Flag](https://app.codecov.io/gh/dotnet/machinelearning/pull/7280/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | Coverage Δ | | |---|---|---| | [Debug](https://app.codecov.io/gh/dotnet/machinelearning/pull/7280/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `68.87% <44.73%> (-0.02%)` | :arrow_down: | | [production](https://app.codecov.io/gh/dotnet/machinelearning/pull/7280/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `63.33% <44.73%> (-0.02%)` | :arrow_down: | | [test](https://app.codecov.io/gh/dotnet/machinelearning/pull/7280/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `89.18% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files with missing lines](https://app.codecov.io/gh/dotnet/machinelearning/pull/7280?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | Coverage Δ | | |---|---|---| | [...icrosoft.ML.Tokenizers/Model/WordPieceTokenizer.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/7280?src=pr&el=tree&filepath=src%2FMicrosoft.ML.Tokenizers%2FModel%2FWordPieceTokenizer.cs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5Ub2tlbml6ZXJzL01vZGVsL1dvcmRQaWVjZVRva2VuaXplci5jcw==) | `75.05% <0.00%> (ø)` | | | [...crosoft.ML.Tokenizers/Normalizer/BertNormalizer.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/7280?src=pr&el=tree&filepath=src%2FMicrosoft.ML.Tokenizers%2FNormalizer%2FBertNormalizer.cs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5Ub2tlbml6ZXJzL05vcm1hbGl6ZXIvQmVydE5vcm1hbGl6ZXIuY3M=) | `62.85% <66.66%> (-3.81%)` | :arrow_down: | | [src/Microsoft.ML.Tokenizers/Model/BertTokenizer.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/7280?src=pr&el=tree&filepath=src%2FMicrosoft.ML.Tokenizers%2FModel%2FBertTokenizer.cs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5Ub2tlbml6ZXJzL01vZGVsL0JlcnRUb2tlbml6ZXIuY3M=) | `63.23% <43.68%> (-10.97%)` | :arrow_down: | ... and [7 files with indirect coverage changes](https://app.codecov.io/gh/dotnet/machinelearning/pull/7280/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet)