fiatrete / OpenDAN-Personal-AI-OS

OpenDAN is an open source Personal AI OS , which consolidates various AI modules in one place for your personal use.
https://opendan.ai
MIT License
1.7k stars 142 forks source link

Update and Refocused Goals for the 0.5.2!!! Version Goals Established! #124

Open waterflier opened 7 months ago

waterflier commented 7 months ago

Overview

Since the release of version 0.5.1 in November 2023, the development cycle for 0.5.2 has extended to nearly double the planned duration, now approaching six months. I believe it is essential to update our community about our current status: the goals for 0.5.2 were set too high. Influenced by the updates from OpenAI, including GPTs, Tools API, and GPT-V, we admittedly aimed for OpenDAN to significantly surpass the capabilities of GPTs. Despite substantial efforts over the past months, much of our work has been experimental in nature. Here are some insights from our "failures":

  1. After an agent possesses a large number of functions, it not only consumes more tokens but also suffers from more severe hallucinations. It will attempt to call non-existent functions and even add imaginary parameters.

The issue mentioned above poses a serious challenge to the system expansion model we previously planned. To address this core issue, we have made numerous attempts but have yet to find a sufficiently good direction. Furthermore, expanding work along each direction requires long-term experimentation to find the best solution. Currently, there are two promising solutions:

  1. RAG Integration Challenges: The mainstream RAG solutions we planned to integrate are flawed; they work for demos but fail to deliver stable, predictable results in regular scenarios. Our architectural vision for OpenDAN includes replacing the traditional FileSystem with a Knowledge Base, a critical infrastructure for the AI era. However, we have two diverging paths based on our experiences:

    • A: Expand the Token Window in LLMs, ensuring accuracy despite higher costs.
    • B: Focus on LLM-curated knowledge graphs, supplemented by vector databases and full-text search for RAG implementation. This approach has shown promise in recent experiments, albeit at a high cost.
  2. Tasklist/Todolist Systems Based on LLM's Planning Capabilities: Our experiments with autonomous task completion by LLMs have been largely unsuccessful without human intervention. We remain confident in our overall framework of Plan -> Decompose -> Execute -> Check -> Merge for OpenDAN Agent/Workflow, but at this moment, we must await further advancements in LLM capabilities (e.g., the release of GPT-5).

Revised Goals

Given these insights, I propose the following revised objectives for 0.5.2:

  1. Better Than ChatGPT: Enhance usability within instant messaging platforms supported by Tunnel (primarily Telegram and Slack), especially for multimedia content handling.
  2. Better Than 0.5.1: Improve system usability, maintaining a CLI-based interaction framework but introducing a graphical installation interface to facilitate testing by non-developers.
  3. SDK Development: Decouple the LLM Process from the Agent system, enabling non-Agent functionalities to utilize LLM in a manner similar to LangChain.
  4. Stable Agent/Workflow Development Framework: Finalize a stable development framework for Agent/Workflow, facilitating different behaviors in agents using varied LLM kernels.
  5. Focus on Four Built-in Agents: Develop four built-in agents to demonstrate system capabilities balanced across different complexity levels and serve as tutorials for agent development.

Moving Forward

Unless there are significant concerns, I will proceed with the new plan:

OpenDAN will continue to focus on the evolution of AI components, while the "Personal Server OS" part of Personal AIOS can be derived from my other project focused on this area, CYFS. We might consider merging these projects in the upcoming version 0.5.3.


Feel free to use or modify this template for your GitHub issue. Let me know if there are specific aspects you'd like to adjust or emphasize differently!

thesocialdev commented 7 months ago

I really appreciate the transparency and the learnings! Keep it up!

swoopsus commented 6 months ago

Impressive project. I came to a similar conclusion on why personal AI is going to have to be structured this way. I plan to make some time in the coming weeks to download DAN and investigate further. You are doing great and important work. Thankyou!

waterflier commented 4 months ago

After 2 months of Knowlege GRPAH exploration, I found that this is not a simple matter, but it is fun that I have seen more peers: https: //www.microsoft.com/en/en-US/Research/Project/Graphrag/. We are not ALONE!!! So I came back and determined the final plan (I won't modify it anymore!)

Focus on Built-in Agents & Enhancements Beyond ChatGPT

New Agent Behavior Logic
In the upcoming 0.5.2 release, we are introducing advanced behavior logic for our Agents. They will possess multiple behaviors: passive behaviors triggered by external events and active behaviors initiated by timers. This will endow our Agents with a higher degree of autonomy and responsiveness, enhancing their ability to perform tasks independently and more efficiently.

Completion of Multi-Agent Workflow Framework
We are finalizing a robust framework that allows multiple Agents to share a common Workspace Environment. This collaborative setup enables Agents to work together seamlessly, tackling more complex and multi-faceted tasks. This collaborative capability is a key differentiator of OpenDAN, demonstrating why it offers significant advantages over using ChatGPT directly.

Enhanced Agent Memory
To improve the cognitive capabilities of our Agents, we are developing an enhanced memory system. This system will be based on the local file system and items, allowing Agents to autonomously store and recall important information. This feature will enable Agents to build upon their experiences and make more informed decisions. In future releases, this memory system will be integrated into a comprehensive Knowledge Graph, further enhancing its utility.

Agent Function & Action Framework
We are creating a unified framework to extend the external capabilities of an LLM Process. In this framework, Functions and Actions are distinct: a Function call always involves an LLM invocation, whereas an Action is typically an outcome of an LLM Process. This differentiation is crucial for developing sophisticated Agent behaviors. Our long-term vision includes enabling Agents to discover new abilities based on the Knowledge Graph and autonomously write code snippets as needed. While these goals are ambitious and complex, the 0.5.2 version will lay the groundwork for achieving them.

Support for Knowledge Graph
Our extensive experimentation has revealed significant limitations with Vector-DB based RAG solutions. We have found that a combination of text retrieval and knowledge graphs is far more effective. Although our current Vector-DB approach is inadequate and the Knowledge Graph solution is not fully mature, we are committed to integrating these technologies into a cohesive framework. The 0.5.2 version will primarily showcase the integration of private data RAG, providing early users with a glimpse of OpenDAN’s potential in this area.

Showcasing Superior Real-World Scenarios with Built-in Agents
We will design specific scenarios where OpenDAN users can experience the enhanced capabilities of our system. These scenarios will leverage the full spectrum of our developments, including Agent Workspace (TODO System), Agent Memory, and Workflow, to address complex real-world problems. These demonstrations will highlight how OpenDAN significantly surpasses the capabilities of ChatGPT.

System Infrastructure Enhancements

Removing Frame Service
After careful analysis, we have decided to remove the Frame Service, which includes traditional distributed system functionalities like DFS, D-RDB, and Name Service. Our goal is for OpenDAN to run on a privately deployed cluster, providing superior scalability, reliability, and availability. To achieve this, OpenDAN will be based on a mature CloudOS such as Kubernetes, rather than developing these services independently. This shift will allow us to focus on designing an OS tailored specifically for Agents.

Ease of Installation
We are committed to making OpenDAN accessible to a broader audience, including advanced non-developer users. To this end, we will provide a Linux-based installation script (essentially a Docker setup), recommending installation on a VPS or NAS. For other systems, we will offer a pre-configured container image as a temporary solution. This approach ensures that users can easily install and run OpenDAN on various platforms (Windows & MacOS).

Simple User Interface
Post-installation, users will be able to access a control page via web, which will support mobile devices. This interface will allow users to monitor system status (especially token consumption) and perform basic operations. Users will be able to configure existing Agents and access them through Telegram, with a simple product design facilitating this interaction.

Optimizing TG Tunnel for Multimodal Experience
We are enhancing the TG Tunnel to better support multimodal content, such as images and voice. These improvements will leverage updates in the GPT-4 API, providing a richer and more versatile user experience.

Important Deferred Requirements (Still Planned!)

Updating Agent State Management with Knowledge Graph Theory
We are planning to update the Agent state management framework based on Knowledge Graph theory. This new approach will be more effective in managing the state and behavior of Agents, providing a solid foundation for future developments.

Empowering Agents to Write and Run Code
Our practical experience has identified three core needs for this capability:

  1. Enabling Agents to write suitable glue code snippets rather than complete systems.
  2. Preparing safer containers for running code.
  3. Reusing existing code to avoid starting from scratch each time.

These requirements are complex but essential for enhancing the autonomy and functionality of our Agents.

Integrating AIGC
We are planning to integrate AI-generated content (AIGC) capabilities to further enhance the creative potential of OpenDAN.

Integrating IoT Environments
Future releases will focus on integrating IoT environments, enabling our Agents to interact with and control a wide range of IoT devices, further expanding their practical applications.

The following is the re-adjusted version 0.5.2 plan

I hope to complete the release of the 0.5.2 version as soon as possible according to the new plan. This version has really been dragged too long!!!

NicciOne commented 4 months ago

Really impressive that the goal has been divided into different stages of achievable functions. Really look forward to watching the community grow and achieve its ultimate vision in the short future.