dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.9k stars 1.85k forks source link

Add more required Tokenizer APIs #7114

Closed tarekgh closed 2 months ago

tarekgh commented 2 months ago

The change is adding two new Tokenizer APIs:

tarekgh commented 2 months ago

CC @ericstj @michaelgsharp

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 89.65517% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 68.47%. Comparing base (c96aac7) to head (221d1fe).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #7114 +/- ## ========================================== + Coverage 68.44% 68.47% +0.02% ========================================== Files 1263 1263 Lines 254838 254921 +83 Branches 26334 26347 +13 ========================================== + Hits 174427 174548 +121 + Misses 73702 73670 -32 + Partials 6709 6703 -6 ``` | [Flag](https://app.codecov.io/gh/dotnet/machinelearning/pull/7114/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | Coverage Δ | | |---|---|---| | [Debug](https://app.codecov.io/gh/dotnet/machinelearning/pull/7114/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `68.47% <89.65%> (+0.02%)` | :arrow_up: | | [production](https://app.codecov.io/gh/dotnet/machinelearning/pull/7114/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `62.86% <87.03%> (+0.02%)` | :arrow_up: | | [test](https://app.codecov.io/gh/dotnet/machinelearning/pull/7114/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `88.58% <93.93%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/dotnet/machinelearning/pull/7114?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | Coverage Δ | | |---|---|---| | [...st/Microsoft.ML.Tokenizers.Tests/TokenizerTests.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/7114?src=pr&el=tree&filepath=test%2FMicrosoft.ML.Tokenizers.Tests%2FTokenizerTests.cs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-dGVzdC9NaWNyb3NvZnQuTUwuVG9rZW5pemVycy5UZXN0cy9Ub2tlbml6ZXJUZXN0cy5jcw==) | `98.64% <100.00%> (+0.07%)` | :arrow_up: | | [src/Microsoft.ML.Tokenizers/Model/Tiktoken.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/7114?src=pr&el=tree&filepath=src%2FMicrosoft.ML.Tokenizers%2FModel%2FTiktoken.cs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5Ub2tlbml6ZXJzL01vZGVsL1Rpa3Rva2VuLmNz) | `70.37% <85.71%> (+0.82%)` | :arrow_up: | | [test/Microsoft.ML.Tokenizers.Tests/TitokenTests.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/7114?src=pr&el=tree&filepath=test%2FMicrosoft.ML.Tokenizers.Tests%2FTitokenTests.cs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-dGVzdC9NaWNyb3NvZnQuTUwuVG9rZW5pemVycy5UZXN0cy9UaXRva2VuVGVzdHMuY3M=) | `99.24% <93.10%> (-0.76%)` | :arrow_down: | | [src/Microsoft.ML.Tokenizers/Tokenizer.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/7114?src=pr&el=tree&filepath=src%2FMicrosoft.ML.Tokenizers%2FTokenizer.cs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5Ub2tlbml6ZXJzL1Rva2VuaXplci5jcw==) | `80.30% <87.23%> (+2.15%)` | :arrow_up: | ... and [8 files with indirect coverage changes](https://app.codecov.io/gh/dotnet/machinelearning/pull/7114/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet)