Closed Coppelian closed 3 months ago
Here is the OpenAI_Usage_Info I retrieved from the log:
The total cost should be $0.068045. This is the Software Info provided by ChatDev: Software Info:
💰cost=$0.034023
🔨version_updates=5.0
📃num_code_files=4
🏞num_png_files=0
📚num_doc_files=7
📃code_lines=112
📋env_lines=1
📒manual_lines=55
🗣num_utterances=26
🤔num_self_reflections=1
❓num_prompt_tokens=15443
❗num_completion_tokens=5429
🌟num_total_tokens=20872
🕑duration=94.00s
ChatDev Starts (20240212234754)
ChatDev Ends (20240212234928)
I took a look in the code and saw the definition for the price: ChatDev/chatdev /statistics.py `def prompt_cost(model_type: str, num_prompt_tokens: float, num_completion_tokens: float): input_cost_map = { "gpt-3.5-turbo": 0.0015, "gpt-3.5-turbo-16k": 0.003, "gpt-3.5-turbo-0613": 0.0015, "gpt-3.5-turbo-16k-0613": 0.003, "gpt-4": 0.03, "gpt-4-0613": 0.03, "gpt-4-32k": 0.06, "gpt-4-1106-preview": 0.01, "gpt-4-1106-vision-preview": 0.01, }
output_cost_map = {
"gpt-3.5-turbo": 0.002,
"gpt-3.5-turbo-16k": 0.004,
"gpt-3.5-turbo-0613": 0.002,
"gpt-3.5-turbo-16k-0613": 0.004,
"gpt-4": 0.06,
"gpt-4-0613": 0.06,
"gpt-4-32k": 0.12,
"gpt-4-1106-preview": 0.03,
"gpt-4-1106-vision-preview": 0.03,
}`
Is there a possibility that ChatDev calculate the price in each Phase itself without using the OpenAI_Usage_Info?
Is there a possibility that ChatDev calculate the price in each Phase itself without using the OpenAI_Usage_Info?
That's exactly the case. ChatDev calculates it's usage locally
Is there a possibility that ChatDev calculate the price in each Phase itself without using the OpenAI_Usage_Info?
That's exactly the case. ChatDev calculates it's usage locally
Hi Tsopic,
Thank you for your response!
I noticed this problem, the price for model might be changing. And there might be slight differences between the real usage and ChatDev's output. Updating these information in the project each time can be annoying.
OpenAI send checkout information for each conversation, and the logic for calculating price is different from the local calculation. (If you spent less tokens, OpenAI treats it to be maximum price, hypothetically). And local calculation could be getting all the tokens(sum) spent at each phase, calculate the price based on the total token number. This could create slight differences between real usage and the ChatDev's log.
To mitigate this issue, is it possible to change the local calculation to use the openai response instead? For example, creating a new cost_manager and use it to collect OpenAI call information, and set up a configuration to activate this cost_manager by user?
I'm trying to produce some log analysis for ChatDev, to compare with MetaGPT and gpt-engineer. There are some similar issues among these projects. It could help build better evaluation standards for each project, and propose better evaluation. Based on that, maybe seeking for a better way to improve the architecture using Agile development, for example.
This is just some ideas, might be incorrect and insufficient. No offensive. Thank you again for your help!
Thank you for your suggestions! I see the concern with fluctuating model prices and potential output discrepancies. Your suggestion to use OpenAI's response for local calculation sounds promising. It will indeed offer more convenience for further evaluation. We'll incorporate this into our future development roadmap. Thanks again for your valuable suggestions. If you have any other advice, please don't hesitate to share and discuss with us.
Hi ChatDev Development Team,
I'm trying to understand the log output by ChatDev. The prompt I used to create the test case is: time python3 run.py --task "We are writing snake in Python. MVC components split in separate files. Keyboard control." --name "cli-snake-game" --model "GPT_3_5_TURBO"
Here is the log I have. log_chatDev.txt
The thing here is I cannot match the OpenAI_Usage_Info with the ChatDev Software Info. The prompt_token, completion token and money spent reported by OpenAI_Usage_Info does not match with the ChatDev Software Info. I hope to see how many GPT calls ChatDev made in each Phase and match them with ChatDev's architecture, but the log makes me confused.
Is there any suggestions to better understand ChatDev's log?
Thank you.