OpenBMB / ChatDev

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
https://arxiv.org/abs/2307.07924
Apache License 2.0
24.38k stars 3.06k forks source link

Issues related with OpenAI_Usage_Info and ChatDev Software Info #351

Closed Coppelian closed 3 months ago

Coppelian commented 4 months ago

Hi ChatDev Development Team,

I'm trying to understand the log output by ChatDev. The prompt I used to create the test case is: time python3 run.py --task "We are writing snake in Python. MVC components split in separate files. Keyboard control." --name "cli-snake-game" --model "GPT_3_5_TURBO"

Here is the log I have. log_chatDev.txt

The thing here is I cannot match the OpenAI_Usage_Info with the ChatDev Software Info. The prompt_token, completion token and money spent reported by OpenAI_Usage_Info does not match with the ChatDev Software Info. I hope to see how many GPT calls ChatDev made in each Phase and match them with ChatDev's architecture, but the log makes me confused.

Is there any suggestions to better understand ChatDev's log?

Thank you.

Coppelian commented 4 months ago

Here is the OpenAI_Usage_Info I retrieved from the log:

  1. [OpenAI_Usage_Info Receive] prompt_tokens: 440 completion_tokens: 17 total_tokens: 457 cost: $0.001388
  2. [OpenAI_Usage_Info Receive] prompt_tokens: 498 completion_tokens: 49 total_tokens: 547 cost: $0.001690
  3. [OpenAI_Usage_Info Receive] prompt_tokens: 514 completion_tokens: 4 total_tokens: 518 cost: $0.001558
  4. [OpenAI_Usage_Info Receive] prompt_tokens: 394 completion_tokens: 4 total_tokens: 398 cost: $0.001198
  5. [OpenAI_Usage_Info Receive] prompt_tokens: 573 completion_tokens: 1001 total_tokens: 1574 cost: $0.005723
  6. [OpenAI_Usage_Info Receive] prompt_tokens: 1343 completion_tokens: 222 total_tokens: 1565 cost: $0.004917
  7. [OpenAI_Usage_Info Receive] prompt_tokens: 1577 completion_tokens: 928 total_tokens: 2505 cost: $0.008443
  8. [OpenAI_Usage_Info Receive] prompt_tokens: 1344 completion_tokens: 122 total_tokens: 1466 cost: $0.004520
  9. [OpenAI_Usage_Info Receive] prompt_tokens: 1477 completion_tokens: 983 total_tokens: 2460 cost: $0.008363
  10. [OpenAI_Usage_Info Receive] prompt_tokens: 1344 completion_tokens: 117 total_tokens: 1461 cost: $0.004500
  11. [OpenAI_Usage_Info Receive] prompt_tokens: 1472 completion_tokens: 950 total_tokens: 2422 cost: $0.008216
  12. [OpenAI_Usage_Info Receive] prompt_tokens: 1291 completion_tokens: 100 total_tokens: 1391 cost: $0.004273
  13. [OpenAI_Usage_Info Receive] prompt_tokens: 1640 completion_tokens: 128 total_tokens: 1768 cost: $0.005432
  14. [OpenAI_Usage_Info Receive] prompt_tokens: 1536 completion_tokens: 804 total_tokens: 2340 cost: $0.007824

The total cost should be $0.068045. This is the Software Info provided by ChatDev: Software Info:

💰cost=$0.034023

🔨version_updates=5.0

📃num_code_files=4

🏞num_png_files=0

📚num_doc_files=7

📃code_lines=112

📋env_lines=1

📒manual_lines=55

🗣num_utterances=26

🤔num_self_reflections=1

num_prompt_tokens=15443

num_completion_tokens=5429

🌟num_total_tokens=20872

🕑duration=94.00s

ChatDev Starts (20240212234754)

ChatDev Ends (20240212234928)

I took a look in the code and saw the definition for the price: ChatDev/chatdev /statistics.py `def prompt_cost(model_type: str, num_prompt_tokens: float, num_completion_tokens: float): input_cost_map = { "gpt-3.5-turbo": 0.0015, "gpt-3.5-turbo-16k": 0.003, "gpt-3.5-turbo-0613": 0.0015, "gpt-3.5-turbo-16k-0613": 0.003, "gpt-4": 0.03, "gpt-4-0613": 0.03, "gpt-4-32k": 0.06, "gpt-4-1106-preview": 0.01, "gpt-4-1106-vision-preview": 0.01, }

output_cost_map = {
    "gpt-3.5-turbo": 0.002,
    "gpt-3.5-turbo-16k": 0.004,
    "gpt-3.5-turbo-0613": 0.002,
    "gpt-3.5-turbo-16k-0613": 0.004,
    "gpt-4": 0.06,
    "gpt-4-0613": 0.06,
    "gpt-4-32k": 0.12,
    "gpt-4-1106-preview": 0.03,
    "gpt-4-1106-vision-preview": 0.03,
}`

Is there a possibility that ChatDev calculate the price in each Phase itself without using the OpenAI_Usage_Info?

Tsopic commented 4 months ago

Is there a possibility that ChatDev calculate the price in each Phase itself without using the OpenAI_Usage_Info?

That's exactly the case. ChatDev calculates it's usage locally

Coppelian commented 4 months ago

Is there a possibility that ChatDev calculate the price in each Phase itself without using the OpenAI_Usage_Info?

That's exactly the case. ChatDev calculates it's usage locally

Hi Tsopic,

Thank you for your response!

I noticed this problem, the price for model might be changing. And there might be slight differences between the real usage and ChatDev's output. Updating these information in the project each time can be annoying.

OpenAI send checkout information for each conversation, and the logic for calculating price is different from the local calculation. (If you spent less tokens, OpenAI treats it to be maximum price, hypothetically). And local calculation could be getting all the tokens(sum) spent at each phase, calculate the price based on the total token number. This could create slight differences between real usage and the ChatDev's log.

To mitigate this issue, is it possible to change the local calculation to use the openai response instead? For example, creating a new cost_manager and use it to collect OpenAI call information, and set up a configuration to activate this cost_manager by user?

I'm trying to produce some log analysis for ChatDev, to compare with MetaGPT and gpt-engineer. There are some similar issues among these projects. It could help build better evaluation standards for each project, and propose better evaluation. Based on that, maybe seeking for a better way to improve the architecture using Agile development, for example.

This is just some ideas, might be incorrect and insufficient. No offensive. Thank you again for your help!

NA-Wen commented 3 months ago

Thank you for your suggestions! I see the concern with fluctuating model prices and potential output discrepancies. Your suggestion to use OpenAI's response for local calculation sounds promising. It will indeed offer more convenience for further evaluation. We'll incorporate this into our future development roadmap. Thanks again for your valuable suggestions. If you have any other advice, please don't hesitate to share and discuss with us.