Closed stephentoub closed 4 months ago
Attention: 69 lines
in your changes are missing coverage. Please review.
Comparison is base (
f976424
) 68.81% compared to head (e78ab0f
) 68.81%. Report is 6 commits behind head on main.
Closing this in favor of the following: https://github.com/dotnet/machinelearning/pull/7035
@tarekgh, this isn't for merging, but it shows appx what I had in mind for incorporating spans into Model (I know you're currently revising the surface area, so take this with a grain of salt). This eliminates a majority of the remaining allocation that occurs when using Tokenizer.CountTokens/EncodeToIds, as it avoids allocating strings for each token that's already in the cache.
Feel free to crib liberally from the second commit and close this PR. Ignore the first commit, which I submitted separately.