All-Hands-AI / OpenHands

🙌 OpenHands: Code Less, Make More
https://all-hands.dev
MIT License
35.26k stars 3.98k forks source link

[Runtime] Mega-issue to track all issues related to bash Interactive terminal #3031

Open xingyaoww opened 3 months ago

xingyaoww commented 3 months ago

This is a mega-issue tracker for the Interactive terminal issue peoples run into.

Feel free to expand this list if i missed any relevant issue!


Cause

These are typically caused by the same reason: OpenDevin uses pexcept to interact with Bash shells, however, the current parsing logic only looks for the next PS1 prompt (e.g., something like root@hostname:/folderABC $).

This will keep looking for such a pattern until it timeout, causing the following things to break, as listed in the PR above:

Fixes

We plan to resolve them as much as I can once arch refactor https://github.com/OpenDevin/OpenDevin/issues/2404 is completed. But these are a non-exhaustive list of patterns we are trying to pexcept and we cannot list everything here:

  1. [ ] Try to cover common use cases of these prompts (e.g., [yes/no] pattern, conda environment pattern)
  2. [ ] Figure out a more general way (rather than writing rules) for agents to interact with these (e.g., we don't write every rules explicitly, but for example, if we've been waiting for more than 5s and there's no new output from the terminal, it probably means it is waiting for user input and we should and it over to the agent - subsequently, we may need to allow agent to issue special keyboard actions like ctrl+D ctrl+C, etc).
  3. [ ] Add something in the prompt that forbids agent goes into interactive programs (e.g., interactive Python, vim, nano, etc)
  4. [ ] We need a way to detect if the agent accidentally goes into such an interactive program, and we need a way to force it out (we currently send ctrl+C, which might not work for a large variety of programs like vim).

If you want to help!

Try to take a look at our existing bash parsing logic for the new architecture (under development!): https://github.com/OpenDevin/OpenDevin/blob/8bfa61f3e4beceb690562b4d105aa01dc50d58d7/opendevin/runtime/client/client.py#L62-L111

You can help to:

  1. Write test cases into https://github.com/OpenDevin/OpenDevin/blob/main/tests/unit/test_runtime.py to expose these interactive bash issues
  2. Try to fix them inside the client/client.py (and/or the ssh_box.py - but we plan to deprecate them soon, so only supporting these on EventStreamRuntime should be sufficient!)
tobitege commented 3 months ago

First, great collection of issues to look into!

Fixes ...if we've been waiting for more than 5s...

As to that 2nd point: if the model runs any installations (like with pip), these 5 seconds could become too short?

Add something in the prompt...

We may have to track down the "usual suspects" IF these have parameters especially to suppress interactive mode: Several package installers have parameters to suppress interactive mode. Here are the options for some of the most common ones (thx to Sonnet):

  1. pip:

    • Use the -q or --quiet flag to suppress output.
    • Use the -y or --yes flag to automatically answer yes to prompts (though pip itself doesn't typically prompt for yes/no).
    pip install -q package_name
  2. Poetry:

    • Poetry does not have interactive prompts during installation, so no specific flag is needed to suppress them.
  3. npm:

    • Use the --yes or -y flag to automatically answer yes to prompts.
    npm install package_name --yes
  4. yarn:

    • Yarn does not typically have interactive prompts during installation, so no specific flag is needed to suppress them.
  5. conda:

    • Use the -y or --yes flag to automatically answer yes to prompts.
    conda install package_name -y
  6. apt-get (for system packages on Debian-based systems):

    • Use the -y or --yes flag to automatically answer yes to prompts.
    sudo apt-get install package_name -y
xingyaoww commented 3 months ago

if the model runs any installations (like with pip), these 5 seconds could become too short?

Yeah.. that's a good point and that's primarily why we did implement the timeout but not this.

I guess the T second timeout should be enforced for the stream of output: e.g., if the terminal keeps printing outputs in the last T second duration (e.g., you see the screen rolling when running installation command), then it is probably working fine without interactions. But if it hangs for T second (no output is printing), then it probably needs someone to look to see if we need to enter/do anything.

I mean, we humans do the same thing! If a command hangs unreasonably long, we might just issue ctrl+C to kill it 😅

Maybe we could (1) switch everything to streaming outputs, (2) just wait for T=10 seconds, (3) if no additional outputs are added during that last T seconds, we are probably stuck - we can let the LLM decide whether it wants to (a) interrupt the process, (b) enter something if this is actually a legit interactive prompt, (c) keep waiting. IMO this might be the next level of agents that is execution time aware 😆

tobitege commented 3 months ago

The streaming back works in my async PR, but still the prompt regex would then be the issue.

SmartManoj commented 3 months ago

@xingyaoww I think this is the ~right issue~ to discuss this.

Will we move to Paramiko for windows support? https://github.com/OpenDevin/OpenDevin/pull/2059#discussion_r1632184135

Why are we using pxssh over Paramiko?

image Source

Relevant comments: @iFurySt https://github.com/OpenDevin/OpenDevin/issues/1739#issuecomment-2106366000

xingyaoww commented 3 months ago

We will no longer use SSH protocol in the new architecture, though - otherwise we need to maintain two separate connections which could be overly complicated. Instead, we will just interact with a local bash shell (for ease of dependency management), so ssh-related libraries might not be relevant anymore.

zenitogr commented 3 months ago

3176 sorry for creating a new issue but I dont see here about long running commands like npm run dev

zenitogr commented 3 months ago

for anyone who want to create a nextjs app here is the noninteractive way: npx create-next-app my-app-name --ts --app --eslint --import-alias "@/*" --use-npm --tailwind --no-src-dir you basicly pass all the options you want BUT you must pass all of them with --no-flag if you dont want them or --flag if you want them. I would just create-next-app in my wsl if you are using wsl and call it a day

here are all the flags: https://nextjs.org/docs/pages/api-reference/create-next-app

zenitogr commented 3 months ago

I think thats the number one priority, cause in react land you have a npm run dev running constantly while you dev the project

also running tasks that need interaction is the defacto even if everybody hates it

its just simpler for packages and core components that are so complicated to ask for user options and prefernces and on the fly choices

Yeah, totally agree! Unfortunately it's not super-trivial to fix (otherwise we would have fixed it already), but I agree that this is high priority

you could use two llms

one llm for coordination in a loop of checking things and the other for coding

gpt-pilot kind of does that with agents, like there are different agents that cooperate in level, there is the top agent that gives commands to lower agents and coordinates what the next step and next agent is gonna be

they use different LLMs for each agent

like gpt-4 for top agent with temp 0.8 and gpt-4o for coding with 0 temp

James4Ever0 commented 3 months ago

https://github.com/OpenDevin/OpenDevin/pull/3040#issuecomment-2260018219

@tobitege @xingyaoww @zenitogr @SmartManoj

(PS: This is one of the major features planned on Cybergod)


Update: 8/16/24

Now I have made some progress over the terminal agent. Check out the demo video below:

tmux_show_1

Terminal environment can be captured as image with cursor denoted in red:

vim_edit_tmux_screenshot

James4Ever0 commented 2 months ago

Now one can obtain terminal input/output statistics in Cybergod like down below:

Tmux statistics

With terminal stats, one can build a more efficient event-driven terminal agent, for example, listen for event TerminalIdle just like NetworkIdle in Playwright, and interval-driven terminal agent can be more intelligent by adding statistics to the prompt, and conditional prompts based on different stats.

Tmux event

More can be learned at here and here.