aiqinxuancai / TiktokenSharp

Token calculation for OpenAI models, using `o200k_base` `cl100k_base` `p50k_base` encoding.
MIT License
110 stars 17 forks source link

Token count is not accurate #2

Closed hk-exec closed 1 year ago

hk-exec commented 1 year ago

I used the below text file to test. SampleText.txt

I used TiktokenSharp in .NET 3.1 while "tiktoken 0.3.3" in Python 3.11.

Logs for tiktoken: Model: gpt-4; TokenCount: 2303; TimeElapse: 8.36 ms Model: gpt-3.5-turbo; TokenCount: 2303; TimeElapse: 1.6 ms Model: text-davinci-003; TokenCount: 2564; TimeElapse: 3.49 ms

Logs for TiktokenSharp: Model: gpt-4; TokenCount: 2313; TimeElapsed: 17 ms. Model: gpt-3.5-turbo; TokenCount: 2313; TimeElapsed: 4 ms. Model: text-davinci-003; TokenCount: 2614; TimeElapsed: 5 ms.

As you can see in the logs, the token count by TiktokenSharp is higher than tiktoken for all three models. Also, TiktokenSharp is slower and takes roughly double the time than tiktoken.

Python 3.11 Code `import tiktoken import time

gpt4_model = tiktoken.encoding_for_model('gpt-4') gtp3_model = tiktoken.encoding_for_model('text-davinci-003') gtp35_model = tiktoken.encoding_for_model('gpt-3.5-turbo')

def time_diff(start: float, end: float) -> str: time_elapsed = (end - start) if time_elapsed > 1.01: return f"{round(time_elapsed, 2)} sec" else: return f"{round(time_elapsed*1000, 2)} ms"

def embedding(): f = open("C:\D\SampleText.txt", "r") text = f.read() start = time.perf_counter() token_count = len(gpt4_model.encode(text)) print(f'\nModel: gpt-4; TokenCount: {token_count}; TimeElapse: {time_diff(start, time.perf_counter())}\n') start = time.perf_counter() token_count = len(gtp35_model.encode(text)) print(f'\nModel: gpt-3.5-turbo; TokenCount: {token_count}; TimeElapse: {time_diff(start, time.perf_counter())}\n') start = time.perf_counter() token_count = len(gtp3_model.encode(text)) print( f'\nModel: text-davinci-003; TokenCount: {token_count}; TimeElapse: {time_diff(start, time.perf_counter())}\n')

if name == 'main': embedding()

`

netcoreapp3.1 code `using System; using System.Diagnostics; using System.IO; using TiktokenSharp;

namespace ConsoleApp.net3._1 { internal class Program { static void Main(string[] args) { Encoding(); }

    static void Encoding()
    {
        string text = File.ReadAllText(@"C:\\D\\SampleText.txt");
        TikToken gpt35Model = TikToken.EncodingForModel("gpt-3.5-turbo");
        TikToken gpt3Model = TikToken.EncodingForModel("text-davinci-003");
        TikToken gpt4Model = TikToken.EncodingForModel("gpt-4");

        Stopwatch sw = Stopwatch.StartNew();
        int tokenCount = gpt4Model.Encode(text).Count;
        sw.Stop();
        Console.WriteLine($"Model: gpt-4; TokenCount: {tokenCount}; TimeElapsed: {sw.ElapsedMilliseconds} ms.");
        sw.Restart();
        tokenCount = gpt35Model.Encode(text).Count;
        sw.Stop();
        Console.WriteLine($"Model: gpt-3.5-turbo; TokenCount: {tokenCount}; TimeElapsed: {sw.ElapsedMilliseconds} ms.");
        sw.Restart();
        tokenCount = gpt3Model.Encode(text).Count;
        sw.Stop();
        Console.WriteLine($"Model: text-davinci-003; TokenCount: {tokenCount}; TimeElapsed: {sw.ElapsedMilliseconds} ms.");
    }
}

} `

aiqinxuancai commented 1 year ago

Hello, thank you for using this library.

After inspection, there are no issues with the algorithm. It is possible that you may have overlooked the fact that when opening and reading text files in Python using "r", newline characters are unified into "\n". However, C#'s ReadAllText method does not perform this operation. To maintain consistency between the two languages, you can try replacing "\r\n" with "\n" only in C# using "text = text.Replace("\r\n", "\n");" or open the file in Python using "rb".

Regarding efficiency, I have inspected the code and found no specific optimizations that can be made. It should be noted that TikToken Python module calls Rust's AOT-wrapped code, which may result in higher execution efficiency. I have not directly compared the performance of the two languages, but you can try comparing them after compiling with AOT in .NET 7.0. Additionally, it is important to consider the scale of testing as single operations may not accurately reflect the efficiency due to CPU thread scheduling. In fact, when I first ran the code you provided, it took around 18ms, but on subsequent runs, it consistently took around 2ms. This suggests a CPU scheduling issue rather than an issue with the code itself. You can try looping the code 100,000 times under .NET 7.0's AOT and then compare the efficiency.

Thank you again for using this library.