dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.88k forks source link

Update tiktoken regexes #7255

Closed stephentoub closed 1 month ago

stephentoub commented 1 month ago

This updates two of the regexes to match the changes made in https://github.com/openai/tiktoken/commit/9f7f69d62d6052dcc2fd54357df6ae9ae2590518.

On .NET Core, these changes are mostly nops, as the main thing they're doing is changing some loops to be atomic, and the auto-atomicity logic in the regex optimizer was already noticing that could be done and doing it automatically. On .NET Framework, it's a bigger deal, as those loops will now be atomic where they weren't previously.

If nothing else, it keeps the regexes in sync with the reference implementation.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 68.78%. Comparing base (be1e428) to head (14fff61). Report is 5 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #7255 +/- ## ======================================= Coverage 68.77% 68.78% ======================================= Files 1462 1463 +1 Lines 272261 272288 +27 Branches 28176 28177 +1 ======================================= + Hits 187254 187297 +43 + Misses 77764 77748 -16 Partials 7243 7243 ``` | [Flag](https://app.codecov.io/gh/dotnet/machinelearning/pull/7255/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | Coverage Δ | | |---|---|---| | [Debug](https://app.codecov.io/gh/dotnet/machinelearning/pull/7255/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `68.78% <ø> (+<0.01%)` | :arrow_up: | | [production](https://app.codecov.io/gh/dotnet/machinelearning/pull/7255/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `63.28% <ø> (+<0.01%)` | :arrow_up: | | [test](https://app.codecov.io/gh/dotnet/machinelearning/pull/7255/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | `89.04% <ø> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files with missing lines](https://app.codecov.io/gh/dotnet/machinelearning/pull/7255?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet) | Coverage Δ | | |---|---|---| | [...Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs](https://app.codecov.io/gh/dotnet/machinelearning/pull/7255?src=pr&el=tree&filepath=src%2FMicrosoft.ML.Tokenizers%2FModel%2FTiktokenTokenizer.cs&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet#diff-c3JjL01pY3Jvc29mdC5NTC5Ub2tlbml6ZXJzL01vZGVsL1Rpa3Rva2VuVG9rZW5pemVyLmNz) | `77.91% <ø> (+0.08%)` | :arrow_up: | ... and [10 files with indirect coverage changes](https://app.codecov.io/gh/dotnet/machinelearning/pull/7255/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=dotnet)
stephentoub commented 1 month ago

/ba-g unrelated torchsharp crash