Open xingyaoww opened 3 months ago
First, great collection of issues to look into!
Fixes ...if we've been waiting for more than 5s...
As to that 2nd point: if the model runs any installations (like with pip), these 5 seconds could become too short?
Add something in the prompt...
We may have to track down the "usual suspects" IF these have parameters especially to suppress interactive mode: Several package installers have parameters to suppress interactive mode. Here are the options for some of the most common ones (thx to Sonnet):
pip:
-q
or --quiet
flag to suppress output.-y
or --yes
flag to automatically answer yes to prompts (though pip
itself doesn't typically prompt for yes/no).pip install -q package_name
Poetry:
npm:
--yes
or -y
flag to automatically answer yes to prompts.npm install package_name --yes
yarn:
conda:
-y
or --yes
flag to automatically answer yes to prompts.conda install package_name -y
apt-get (for system packages on Debian-based systems):
-y
or --yes
flag to automatically answer yes to prompts.sudo apt-get install package_name -y
if the model runs any installations (like with pip), these 5 seconds could become too short?
Yeah.. that's a good point and that's primarily why we did implement the timeout but not this.
I guess the T
second timeout should be enforced for the stream of output: e.g., if the terminal keeps printing outputs in the last T
second duration (e.g., you see the screen rolling when running installation command), then it is probably working fine without interactions. But if it hangs for T
second (no output is printing), then it probably needs someone to look to see if we need to enter/do anything.
I mean, we humans do the same thing! If a command hangs unreasonably long, we might just issue ctrl+C
to kill it 😅
Maybe we could (1) switch everything to streaming outputs, (2) just wait for T=10
seconds, (3) if no additional outputs are added during that last T
seconds, we are probably stuck - we can let the LLM decide whether it wants to (a) interrupt the process, (b) enter something if this is actually a legit interactive prompt, (c) keep waiting. IMO this might be the next level of agents that is execution time aware 😆
The streaming back works in my async PR, but still the prompt regex would then be the issue.
@xingyaoww I think this is the ~right issue~ to discuss this.
Will we move to Paramiko for windows support? https://github.com/OpenDevin/OpenDevin/pull/2059#discussion_r1632184135
Why are we using pxssh over Paramiko?
Relevant comments: @iFurySt https://github.com/OpenDevin/OpenDevin/issues/1739#issuecomment-2106366000
We will no longer use SSH protocol in the new architecture, though - otherwise we need to maintain two separate connections which could be overly complicated. Instead, we will just interact with a local bash shell (for ease of dependency management), so ssh-related libraries might not be relevant anymore.
for anyone who want to create a nextjs app here is the noninteractive way:
npx create-next-app my-app-name --ts --app --eslint --import-alias "@/*" --use-npm --tailwind --no-src-dir
you basicly pass all the options you want BUT you must pass all of them with --no-flag if you dont want them or --flag if you want them. I would just create-next-app in my wsl if you are using wsl and call it a day
here are all the flags: https://nextjs.org/docs/pages/api-reference/create-next-app
I think thats the number one priority, cause in react land you have a npm run dev running constantly while you dev the project
also running tasks that need interaction is the defacto even if everybody hates it
its just simpler for packages and core components that are so complicated to ask for user options and prefernces and on the fly choices
Yeah, totally agree! Unfortunately it's not super-trivial to fix (otherwise we would have fixed it already), but I agree that this is high priority
you could use two llms
one llm for coordination in a loop of checking things and the other for coding
gpt-pilot kind of does that with agents, like there are different agents that cooperate in level, there is the top agent that gives commands to lower agents and coordinates what the next step and next agent is gonna be
they use different LLMs for each agent
like gpt-4 for top agent with temp 0.8 and gpt-4o for coding with 0 temp
https://github.com/OpenDevin/OpenDevin/pull/3040#issuecomment-2260018219
@tobitege @xingyaoww @zenitogr @SmartManoj
(PS: This is one of the major features planned on Cybergod)
Update: 8/16/24
Now I have made some progress over the terminal agent. Check out the demo video below:
Terminal environment can be captured as image with cursor denoted in red:
Now one can obtain terminal input/output statistics in Cybergod like down below:
With terminal stats, one can build a more efficient event-driven terminal agent, for example, listen for event TerminalIdle
just like NetworkIdle
in Playwright, and interval-driven terminal agent can be more intelligent by adding statistics to the prompt, and conditional prompts based on different stats.
This is a mega-issue tracker for the Interactive terminal issue peoples run into.
Feel free to expand this list if i missed any relevant issue!
Cause
These are typically caused by the same reason: OpenDevin uses
pexcept
to interact with Bash shells, however, the current parsing logic only looks for the nextPS1
prompt (e.g., something likeroot@hostname:/folderABC $
).This will keep looking for such a pattern until it timeout, causing the following things to break, as listed in the PR above:
python3
), where the new prompt changes to>>
nano
,vim
), where the display could be completely broken? (I'm not familiar with the protocol here, though)(base)
) before thePS1
prompt, causing the currentpexpect
parsing to breakPassword:
)(yes/no/[fingerprint])
requesting user confirmation.Fixes
We plan to resolve them as much as I can once arch refactor https://github.com/OpenDevin/OpenDevin/issues/2404 is completed. But these are a non-exhaustive list of patterns we are trying to
pexcept
and we cannot list everything here:[yes/no]
pattern, conda environment pattern)ctrl+D
ctrl+C
, etc).ctrl+C
, which might not work for a large variety of programs likevim
).If you want to help!
Try to take a look at our existing bash parsing logic for the new architecture (under development!): https://github.com/OpenDevin/OpenDevin/blob/8bfa61f3e4beceb690562b4d105aa01dc50d58d7/opendevin/runtime/client/client.py#L62-L111
You can help to:
https://github.com/OpenDevin/OpenDevin/blob/main/tests/unit/test_runtime.py
to expose these interactive bash issuesclient/client.py
(and/or thessh_box.py
- but we plan to deprecate them soon, so only supporting these onEventStreamRuntime
should be sufficient!)