All-Hands-AI / OpenHands

🙌 OpenHands: Code Less, Make More
https://all-hands.dev
MIT License
32.89k stars 3.76k forks source link

System Architecture #77

Closed rb125 closed 6 months ago

rb125 commented 6 months ago

System Overview

The AI-powered software engineering assistant employs a multi-agent swarm model to provide a comprehensive development experience. At its core is a delegator agent that manages user interactions, project contexts, and delegates tasks to specialized agents.

Components

Web Application (Frontend)

Chat Interface: Primary user interaction point. Driven by a robust NLP engine for natural language communication. Embedded IDE: Full-featured web IDE (Theia-based) for code development and project review. Shell Emulator: Secure shell environment for development tasks and project setup. Settings: Manages user preferences and access to LLM credentials.

Delegator Agent

Conversation Management: Interprets user intent, routes requests, and manages interruptions across multiple projects. Project Contextualization: Tracks active projects, their stage, and associated data. Task Delegation: Delegates tasks to the appropriate agents, manages dependencies, and tracks progress. State Management: Maintains a robust system for storing and retrieving project states to handle context switching fluidly.

Specialized Agent Swarm

Requirements Engineering Agent: Excels in requirements elicitation, design suggestion, and generating architectural diagrams. May leverage specialized LLMs and knowledge bases. Project Management Agent: Focuses on task breakdown, estimation, timelines, and potentially integrates with external PM tools. Software Development Agent: Code-centric, responsible for code generation, stubbing, test cases, PRs, and leverages LLMs trained on code. Release Engineering Agent: Handles environment setup, CI/CD pipelines, deployment strategies, and build configurations. QA/QC Agent: Generates test plans, understands different testing paradigms, and may suggest tools and extensive test suites.

Backend Server

Coordination Logic: Houses the delegator agent and potentially the specialized swarm, enabling communication and orchestration. Secure Credential Storage: Encrypted system for storing and retrieving user LLM API keys. Shared Knowledge Base (Optional): If appropriate, a centralized store of data, learnings, and code examples to improve the collective intelligence of the agents.

External Services

GitHub: Integration for repository creation, code management, and issue tracking. User-Selected LLM Providers: System connects to external LLMs (GPT-3, etc.) via a flexible API abstraction layer. CI Server: Executes test suites, build processes, and may connect with deployment pipelines.

System Strengths

Specialization: Agents become highly focused, increasing potential for high-quality outputs in their domains. User-Focused: The delegator creates a seamless chat-based interface, simplifying the complexity for the user. Adaptability: LLM choices reside with the user. New LLMs or specialized agents can be integrated over time. Resilience: The swarm model allows for potential scaling and lessens the impact of single agent failures.

Jiaxin-Pei commented 6 months ago

I think a documentation management agent could also be helpful, which helps to:

rb125 commented 6 months ago

Yea makes sense.

Mystique-orca commented 6 months ago

I think a documentation management agent could also be helpful, which helps to:

  • Create/update docs for newly developed features
  • Retrieval relevant documentation when creating new features

I think RTD will do the job for this one. I love it!

rbren commented 6 months ago

Given that this is such a high-level architectural proposal, I suggest we move the discussion to the roadmap PR. It'll be helpful to make specific suggestions there so that we're all aligned.