Ensure tiktoken implementation up-to-date with OpenAI reference implementation

dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.

https://dot.net/ml

MIT License

8.92k stars 1.86k forks source link

Ensure tiktoken implementation up-to-date with OpenAI reference implementation #7019

Open stephentoub opened 4 months ago

stephentoub commented 4 months ago

The implementation at https://github.com/openai/tiktoken/commits/main/src/lib.rs has seen several improvements in the last year (eg https://github.com/openai/tiktoken/pull/255), including a couple that claim perf wins around algorithmic complexity for long inputs. The comments in the source also cite ways of avoiding needing an LRU cache. We should ensure the C# implementation has all the corresponding goodness.

stephentoub commented 4 months ago

cc: @tarekgh