BAAI-Agents / Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
https://baai-agents.github.io/Cradle/
MIT License
1.71k stars 150 forks source link

Slow program execution #79

Open it666-png opened 1 month ago

it666-png commented 1 month ago

When running automated program operation testing, the entire operation process is very slow, and sometimes it takes 20 minutes to complete one step. Is there any parameter that can be adjusted?

WeihaoTan commented 1 month ago

The main bottleneck is the time to wait for GPT-4o's response. It seems that you need to wait for a long time to get GPT-4o's response. Maybe you could run a minimal GPT-4o call and see whether you could get the reply within several seconds (https://platform.openai.com/docs/api-reference/chat/create?lang=python).

it666-png commented 1 month ago

Sorry, I didn't describe it clearly. The slow execution of automatic programs is not related to the response time of GPT. I can see that the execution steps generated by GPT are repeated many times in the execution window, but the program either has no response or has to wait for a long time before moving. I didn't make any changes, I just followed the official documentation to perform the relevant operations.

it666-png commented 1 month ago

Is there a place to upload videos? I recorded a video, is it convenient to upload

WeihaoTan commented 1 month ago

Can you check the actions outputted? Are they correct and executable?

it666-png commented 1 month ago

I can't confirm this, I just looked at the content and found that the operation steps generated by GPT are correct, but the program did not synchronize the operation.

it666-png commented 1 month ago

Is there a place to upload videos? I recorded a video, is it convenient to upload

WeihaoTan commented 1 month ago

You can upload the video to any platform you prefer and share the link with us. But I do not think a video is enough for us to debug. Could you provide the corresponding log file in the run folder after you run the code, which should contain all the input and output from GPT-4o?

it666-png commented 1 month ago

链接: https://pan.baidu.com/s/1vTNzaUNQWYhHAlQZBqGv7Q 提取码: 6666

The video and log files are in the cloud storage. Could you please analyze the reason? Thank you very much.

WeihaoTan commented 1 month ago

Could you explain more about your issue? This video looks good to me. The agent executes several actions successfully in the 20-minute-long video.

it666-png commented 1 month ago

There are mainly the following issues:

  1. The execution time is too long, with 3 tasks taking 20 minutes;
  2. And none of the three tasks were successfully executed: a, Create a schedule b. Send a Hi message to Gradle c. Send PDF file None of the three tasks were successful
WeihaoTan commented 1 month ago

For the time issue, it is quite common. Due to the delay in the GPT-4o API's response, we need to wait for about 15-60 seconds for each action.

For the performance issue, I noticed that you did not open Feishu as full screen. Could you check your settings according to our readme? Thanks

it666-png commented 1 month ago

Feishu opens in full screen, but when the program runs, Feishu automatically shrinks the window Additionally, I operated remotely from the server, and the server settings were configured according to the requirements outlined in the cradle project documentation