Closed Talented-Business closed 1 month ago
Hi, I’m using the tiktoken library to count tokens for the gpt-4o-mini model. However, I’ve noticed a discrepancy between my token counts and the counts returned by the OpenAI API. It seems that tiktoken doesn’t fully support this new model yet, and the tokenization may differ slightly. Is there a plan to officially support the gpt-4o-mini in tiktoken?
Thanks in advance!
Hi openai devs,
how can I count tokens for o1-preview and o1-mini?
Thanks in advance!
Here’s my example code:
const countTokens = (messages: any[], model: TiktokenModel): number => { const enc = encoding_for_model(model); // Tokenizer for the model let tokenCount = 0;
// Iterating over each message and counting tokens for 'role' and 'content'
messages.forEach((message) => {
tokenCount += enc.encode(message.role).length; // Count role tokens
tokenCount += enc.encode(message.content).length; // Count content tokens
});
return tokenCount;
};
const messages = [ { role: 'system', content: instructions }, { role: 'user', content: userContent } ];
const model: TiktokenModel = "gpt-4o-mini"; const tokenCountInput = countTokens(messages, model);
Hello! Will keep monitoring https://github.com/openai/tiktoken/issues/337 to see if there are any changes w.r.t. the underlying token map.
@tmlxrd Just counting role and content is not necessarily enough. You need to also include the tokens which are used to separate the messages: see dqbd/tiktokenizer
@tmlxrd Just counting role and content is not necessarily enough. You need to also include the tokens which are used to divide the messages: see dqbd/tiktokenizer
Thank you for your answer! I do this because I get a smaller number of tokens than openai returns in the api response
I got 1708 incoming tokens in the big text and 1717 in the response from openai. It's a small difference, but I don't understand what it's about, so I added two roles
UPD: Thank you for the link to the feature. It works better now, but there are discrepancies with the answer from openai
Do 'o1-mini' and 'o1-preview' still use the cl100k_base vocabulary?
Do 'o1-mini' and 'o1-preview' still use the cl100k_base vocabulary?
Hi. Unfortunately, I don't know that. Share the answer if you find the information
Got clarification with the latest tiktoken@0.8.0
release, updating here as well
Hi openai devs,
how can I count tokens for o1-preview and o1-mini?
Thanks in advance!