Open James4Ever0 opened 1 month ago
I agree with you that terminals are more lightweight than GUI. Therefore, in the latest code for online benchmarking, we can turn on headless mode when the task does not require screenshot inputs. Your environment looks great! In my opinion it can both serve as an environment with additional tasks to benchmark agent performance. I haven't figured out what's the simplest and most elegant way to integrate this into our codebase. Can you provide some guidance?
First, install required binaries:
sudo apt install -y tmux tmuxp aha
Next, install the following dependencies:
pip install parse playwright beautifulsoup4
# setup playwright if you want to take terminal screenshots
playwright install chromium
Finally copy the lib.py
, then run test_lib.py
next to the lib.py
.
The SESSION_COMMAND
in test_lib.py
is the initial terminal environment command to be executed. Change it according to your need.
To view the environment:
preview = session.preview_html(show_cursor=True,wrap_html=True, dark_mode=True, grayscale=True)
To interact with the environment:
# note that both special key and literal strings can be sent.
env.send_key("date")
env.send_key("Enter") # special key
A full mapping from conventional special keys to standard Tmux special keys can be generated by running generate_funckey_alias.py
You can read the test file for further integration.
Operate systems initially only provide text-only terminal interfaces, before GUI appears. Terminal is less resource intensive, more lightweight therefore can scale more easily than GUI. Besides, most LLMs are text-only.
I have developed a general purpose terminal interaction environment for AI agents such as OpenDevin and OpenInterpreter. Here are few things it can do.
You can see the position of the cursor, the range of the selected text.
You can also capture a screenshot of the terminal with cursor denoted in red.
Grayscale augmented terminal gives high contrast to the red cursor, making the agent easier to locate it.
Would you like to add it to Agent-Studio? This further enhances the agent capability, empower it to interact more with the OS.
You can learn more here.