All-Hands-AI / OpenHands

🙌 OpenHands: Code Less, Make More
https://all-hands.dev
MIT License
31.25k stars 3.6k forks source link

MACS: Multi-Agent Cooperative Structure #3151

Open rezzie-rich opened 1 month ago

rezzie-rich commented 1 month ago

Summary I propose a comprehensive agent workflow utilizing a multi-agent framework to support the entire lifecycle of software development. Emphasizing a deeper understanding of the context and user requests to increase the quality of the output by feeding it rich inputs.

Motivation While generating high-quality code snippets is essential, a software developer agent must also consider various other contexts such as existing projects, their dependencies, a thorough understanding of user requests, and relevant source code. Handling all these tasks with a single agent can quickly become overwhelming. Therefore, employing multiple agents for different tasks in a coordinated manner can be highly beneficial. The effectiveness of LLMs depends significantly on their understanding of the task. The same LLM can produce lower-quality output if the prompt and context comprehension are poor. By isolating the comprehension aspect (user prompt and relevant context) from the task execution, we can develop both aspects more effectively, thereby improving the overall quality.

Technical Design ODdiagram

The primary role of the 'comprehender-agents' is to understand the context based on the user's input. They will gather all relevant context, such as existing projects and dependencies (RAG), to better understand the task.

There should be a toggle switch on the chat interface with real-time options for 'tutor' or 'execute.' Depending on the selected option, the 'comprehender-agents' will call either the 'tutor-agent' or the 'prompt engineer agent.' If the user's input is incomplete, the 'comprehender-agents' will call the 'inquirer-agent,' whose job is to extract more information from the user to improve context understanding. Before the 'prompt engineer agent' submits its prompt to 'codeactagent', it can show up on the chat interface with a button for the user to edit (should also include a 'submit without edit' option to avoid this step ).

Both the 'comprehender-agents' and the 'codeactagents' can access 'tools & sub-agents.' This group includes agents such as the planner-agent, micro-agents, gptswarm (a potential addition), and agentless components (post-isolation of each component), as well as the browsing agent. While tools can be directly accessed (agentless & gptswarm due to its hive mind behavior), other agents should operate under the delegator agent within a hierarchical structure. 'codeactagent' can also directly access 'comprehender-agent'.

based on the user's inquiry, 'comprehender-agent' either takes the actions of 'codeactagent' and interacts with the user either through the 'tutor' or inquirer' agent. If the user's request is to execute, then 'codeactagent' generates the output directly.

For eval and autonomous mode, the tutor and inquirer agents could be made unavailable so the comprehender-agents will only call the prompt engineer agent and proceed with the task.

Alternatives to Consider limit the tools and sub-agents codeact can access.

Additional context this is inspired by issues #3077, #2955, and ( #2821 X #2248 - a hybrid of vector-based auto context and repomap ) discussion #2936

'prompt-engineer-agent' is inspired by "Self-Instruct: Aligning Language Models with Self-Generated Instructions" https://arxiv.org/abs/2212.10560

rezzie-rich commented 1 month ago

Made slight changes to the original post.

This proposal considers the existing framework and aims to complement it. The primary focus is on achieving a deeper understanding of the user's requests and their contexts. Often, the main reason behind LLMs not producing appropriate content is not their capability to execute tasks, but rather a lack of understanding of the user's intent and context.

I believe that considering this approach will greatly benefit the community.

github-actions[bot] commented 4 days ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.