SkyworkAI / agent-studio

Benchmarks, environments, and toolkits for general computer agents
https://skyworkai.github.io/agent-studio/
GNU Affero General Public License v3.0
153 stars 11 forks source link

Real terminal interface support #52

Open James4Ever0 opened 1 month ago

James4Ever0 commented 1 month ago

Operate systems initially only provide text-only terminal interfaces, before GUI appears. Terminal is less resource intensive, more lightweight therefore can scale more easily than GUI. Besides, most LLMs are text-only.

I have developed a general purpose terminal interaction environment for AI agents such as OpenDevin and OpenInterpreter. Here are few things it can do.

You can see the position of the cursor, the range of the selected text.

tmux_show_1

You can also capture a screenshot of the terminal with cursor denoted in red.

vim_edit_tmux_screenshot

Grayscale augmented terminal gives high contrast to the red cursor, making the agent easier to locate it.

grayscale_dark_tmux

Would you like to add it to Agent-Studio? This further enhances the agent capability, empower it to interact more with the OS.

You can learn more here.

ltzheng commented 1 month ago

I agree with you that terminals are more lightweight than GUI. Therefore, in the latest code for online benchmarking, we can turn on headless mode when the task does not require screenshot inputs. Your environment looks great! In my opinion it can both serve as an environment with additional tasks to benchmark agent performance. I haven't figured out what's the simplest and most elegant way to integrate this into our codebase. Can you provide some guidance?

James4Ever0 commented 1 month ago

First, install required binaries:

sudo apt install -y tmux tmuxp aha

Next, install the following dependencies:

pip install parse playwright beautifulsoup4

# setup playwright if you want to take terminal screenshots
playwright install chromium

Finally copy the lib.py, then run test_lib.py next to the lib.py.

The SESSION_COMMAND in test_lib.py is the initial terminal environment command to be executed. Change it according to your need.

To view the environment:

preview = session.preview_html(show_cursor=True,wrap_html=True, dark_mode=True, grayscale=True)

To interact with the environment:

# note that both special key and literal strings can be sent.
env.send_key("date")
env.send_key("Enter") # special key

A full mapping from conventional special keys to standard Tmux special keys can be generated by running generate_funckey_alias.py

You can read the test file for further integration.