Open ZakFahey opened 6 months ago
hi @ZakFahey did you try TikToken with "gpt-4"?
Something like: install nuget Microsoft.ML.Tokenizers
version 0.22.0-preview.24179.1
and
public sealed class Tokenizer
{
private static readonly Tokenizer s_tokenizer = Tokenizer.CreateTiktokenForModel(
"gpt-4", new Dictionary<string, int> { { "<|im_start|>", 100264 }, { "<|im_end|>", 100265 } });
public int CountTokens(string text)
{
return s_tokenizer.CountTokens(text);
}
}
It uses a different tokenizer apparently, so now this one will give me inaccurate values.
It seems that at the time of writing, there is no up-to-date token counting library that supports GPT-4o for C#.