Counts are slightly off on completion for chat models

arcticfly commented 1 year ago

Awesome project, thank you for adding this to the ecosystem! My brother and I are currently working on https://github.com/openpipe/openpipe, and this package is incredibly useful to us.

I do notice that completion token counts are slightly off on some models. Specifically, it appears that GPTTokens always believes that the completion includes more tokens than it actually does. I created an experiment that compares the number of tokens OpenAI reports were used for a certain response (returned from non-streamed responses) against tokens calculated using GPTTokens (calculated on streamed responses). Here's the experiment: https://openpipe.ai/experiments/e2d5d255-5731-4dbc-9f83-7f642745404d.

I think we're using the latest version (1.0.10): https://github.com/OpenPipe/openpipe/blob/main/package.json#L39

And here are the relevant screenshots:

Non-streamed token counts (read from response):

Streamed token counts (calculated using GPTTokens):

Again, amazing project! Starring now!

Yzwcp commented 1 year ago

确实有些误差，不知是什么原因

Cainier commented 1 year ago

Awesome project, thank you for adding this to the ecosystem! My brother and I are currently working on https://github.com/openpipe/openpipe, and this package is incredibly useful to us.

I do notice that completion token counts are slightly off on some models. Specifically, it appears that GPTTokens always believes that the completion includes more tokens than it actually does. I created an experiment that compares the number of tokens OpenAI reports were used for a certain response (returned from non-streamed responses) against tokens calculated using GPTTokens (calculated on streamed responses). Here's the experiment: https://openpipe.ai/experiments/e2d5d255-5731-4dbc-9f83-7f642745404d.

I think we're using the latest version (1.0.10): https://github.com/OpenPipe/openpipe/blob/main/package.json#L39

And here are the relevant screenshots:

Non-streamed token counts (read from response):

Streamed token counts (calculated using GPTTokens):

Again, amazing project! Starring now!

Thanks for the question, after careful inspection and comparison, I found the problem in the code and will fix it in tomorrow's v1.1.0 release

The cause of the problem (take prompt as an example):

const messages = [
  { 'role' : 'user', 'content': 'Hello' },
  { 'role' : 'assistant', 'content': 'Hi, how are you?' },
  { 'role' : 'user', 'content': 'tell a joke' },
]

When openai processes Prompt, the entire messages will participate in the tokens calculation, and the same is true for gpt-tokens, so the prompt calculation is correct

But when openai responds to Completion, for example:

console.log(chatCompletion.data.choices[0].message)
// { 'role' : 'assistant', 'content': 'this is a joke demo' }

openai only calculates tokens for 'content' content, but gpt-tokens calculates tokens for the entire JSON object, so the calculation of tokens is relatively high

arcticfly commented 1 year ago

Thanks, I appreciate the fast response

Cainier / gpt-tokens

Counts are slightly off on completion for chat models #20