dluc / openai-tools

A collection of tools for working with OpenAI
Creative Commons Zero v1.0 Universal
95 stars 13 forks source link

Add support for GPT-4o #6

Open ZakFahey opened 1 month ago

ZakFahey commented 1 month ago

It uses a different tokenizer apparently, so now this one will give me inaccurate values.

It seems that at the time of writing, there is no up-to-date token counting library that supports GPT-4o for C#.

dluc commented 1 month ago

hi @ZakFahey did you try TikToken with "gpt-4"?

Something like: install nuget Microsoft.ML.Tokenizers version 0.22.0-preview.24179.1 and

public sealed class Tokenizer
{
    private static readonly Tokenizer s_tokenizer = Tokenizer.CreateTiktokenForModel(
        "gpt-4", new Dictionary<string, int> { { "<|im_start|>", 100264 }, { "<|im_end|>", 100265 } });

    public int CountTokens(string text)
    {
        return s_tokenizer.CountTokens(text);
    }
}