Open jasonsu123 opened 3 months ago
Thank you very much for your response. This feature is already great.
As for your mention that GPT-4O currently does not have a dedicated token segmentation tool, could we change the model to cl100k as follows? Thank you.
import tiktoken
encoding = tiktoken.encoding_for_model("cl100k")
token_contents = len(encoding.encode(contents))
Thank you very much for your response. This feature is already great.
As for your mention that GPT-4O currently does not have a dedicated token segmentation tool, could we change the model to cl100k as follows? Thank you.
import tiktoken
encoding = tiktoken.encoding_for_model("cl100k")
token_contents = len(encoding.encode(contents))
Can you explain what you mean? We currently use cl100k as a fallback: https://github.com/AgentOps-AI/tokencost/blob/e1d52dbaa3ada232aa68dabf5b58662da4bc2363/tokencost/costs.py#L101
Yes, I later changed the model name in this code segment to cl100k_base, but I encountered an error when running the program. I'm not sure where the issue in my code is. Thank you.
import tiktoken
encoding = tiktoken.encoding_for_model("cl100k_base")
token_contents = len(encoding.encode(content))
print(f"The prompt contains {token_contents} tokens.")
The error message is: in encoding_for_model raise KeyError( KeyError: 'Could not automatically map cl100k_base to a tokenizer. Please use `tiktoken.get_encodin
Hello, I noticed that the code package you wrote is very impressive. However, is it only capable of counting tokens for regular simple chats?
I saw your code requires the input prompt to include "role", "user", and "content" strings.....
message_prompt = [{ "role": "user", "content": "Hello world"}]
If using the assistant mode with instructions, file search, and uploading files to vector stores for RAG, the calculation might be more complex.
Are the token calculation methods for gpt-4-1106-preview and gpt4o the same? I checked the tokenizer on the official website, but the tokenizer for gpt4o is not yet available: https://platform.openai.com/tokenizer
Currently, my code for calculating tokens is as follows. Is this correct? Thank you.
import tiktoken encoding = tiktoken.encoding_for_model("gpt-4-1106-preview") token_contents = len(encoding.encode(contents))