Open waterflier opened 7 months ago
I really appreciate the transparency and the learnings! Keep it up!
Impressive project. I came to a similar conclusion on why personal AI is going to have to be structured this way. I plan to make some time in the coming weeks to download DAN and investigate further. You are doing great and important work. Thankyou!
After 2 months of Knowlege GRPAH exploration, I found that this is not a simple matter, but it is fun that I have seen more peers: https: //www.microsoft.com/en/en-US/Research/Project/Graphrag/. We are not ALONE!!! So I came back and determined the final plan (I won't modify it anymore!
)
New Agent Behavior Logic
In the upcoming 0.5.2 release, we are introducing advanced behavior logic for our Agents. They will possess multiple behaviors: passive behaviors triggered by external events and active behaviors initiated by timers. This will endow our Agents with a higher degree of autonomy and responsiveness, enhancing their ability to perform tasks independently and more efficiently.
Completion of Multi-Agent Workflow Framework
We are finalizing a robust framework that allows multiple Agents to share a common Workspace Environment. This collaborative setup enables Agents to work together seamlessly, tackling more complex and multi-faceted tasks. This collaborative capability is a key differentiator of OpenDAN, demonstrating why it offers significant advantages over using ChatGPT directly.
Enhanced Agent Memory
To improve the cognitive capabilities of our Agents, we are developing an enhanced memory system. This system will be based on the local file system and items, allowing Agents to autonomously store and recall important information. This feature will enable Agents to build upon their experiences and make more informed decisions. In future releases, this memory system will be integrated into a comprehensive Knowledge Graph, further enhancing its utility.
Agent Function & Action Framework
We are creating a unified framework to extend the external capabilities of an LLM Process. In this framework, Functions and Actions are distinct: a Function call always involves an LLM invocation, whereas an Action is typically an outcome of an LLM Process. This differentiation is crucial for developing sophisticated Agent behaviors. Our long-term vision includes enabling Agents to discover new abilities based on the Knowledge Graph and autonomously write code snippets as needed. While these goals are ambitious and complex, the 0.5.2 version will lay the groundwork for achieving them.
Support for Knowledge Graph
Our extensive experimentation has revealed significant limitations with Vector-DB based RAG solutions. We have found that a combination of text retrieval and knowledge graphs is far more effective. Although our current Vector-DB approach is inadequate and the Knowledge Graph solution is not fully mature, we are committed to integrating these technologies into a cohesive framework. The 0.5.2 version will primarily showcase the integration of private data RAG, providing early users with a glimpse of OpenDAN’s potential in this area.
Showcasing Superior Real-World Scenarios with Built-in Agents
We will design specific scenarios where OpenDAN users can experience the enhanced capabilities of our system. These scenarios will leverage the full spectrum of our developments, including Agent Workspace (TODO System), Agent Memory, and Workflow, to address complex real-world problems. These demonstrations will highlight how OpenDAN significantly surpasses the capabilities of ChatGPT.
Removing Frame Service
After careful analysis, we have decided to remove the Frame Service, which includes traditional distributed system functionalities like DFS, D-RDB, and Name Service. Our goal is for OpenDAN to run on a privately deployed cluster, providing superior scalability, reliability, and availability. To achieve this, OpenDAN will be based on a mature CloudOS such as Kubernetes, rather than developing these services independently. This shift will allow us to focus on designing an OS tailored specifically for Agents.
Ease of Installation
We are committed to making OpenDAN accessible to a broader audience, including advanced non-developer users. To this end, we will provide a Linux-based installation script (essentially a Docker setup), recommending installation on a VPS or NAS. For other systems, we will offer a pre-configured container image as a temporary solution. This approach ensures that users can easily install and run OpenDAN on various platforms (Windows & MacOS).
Simple User Interface
Post-installation, users will be able to access a control page via web, which will support mobile devices. This interface will allow users to monitor system status (especially token consumption) and perform basic operations. Users will be able to configure existing Agents and access them through Telegram, with a simple product design facilitating this interaction.
Optimizing TG Tunnel for Multimodal Experience
We are enhancing the TG Tunnel to better support multimodal content, such as images and voice. These improvements will leverage updates in the GPT-4 API, providing a richer and more versatile user experience.
Updating Agent State Management with Knowledge Graph Theory
We are planning to update the Agent state management framework based on Knowledge Graph theory. This new approach will be more effective in managing the state and behavior of Agents, providing a solid foundation for future developments.
Empowering Agents to Write and Run Code
Our practical experience has identified three core needs for this capability:
These requirements are complex but essential for enhancing the autonomy and functionality of our Agents.
Integrating AIGC
We are planning to integrate AI-generated content (AIGC) capabilities to further enhance the creative potential of OpenDAN.
Integrating IoT Environments
Future releases will focus on integrating IoT environments, enabling our Agents to interact with and control a wide range of IoT devices, further expanding their practical applications.
[x] AIOS Kernel
[x] AI Compute System,@waterflier, A2
[ ] Build-in Service
[ ] Build-in Agents/Apps
[ ] UI
[x] 0.5.2 Integration Test
[ ] SDK
I hope to complete the release of the 0.5.2 version as soon as possible according to the new plan. This version has really been dragged too long!!!
Really impressive that the goal has been divided into different stages of achievable functions. Really look forward to watching the community grow and achieve its ultimate vision in the short future.
Overview
Since the release of version 0.5.1 in November 2023, the development cycle for 0.5.2 has extended to nearly double the planned duration, now approaching six months. I believe it is essential to update our community about our current status: the goals for 0.5.2 were set too high. Influenced by the updates from OpenAI, including GPTs, Tools API, and GPT-V, we admittedly aimed for OpenDAN to significantly surpass the capabilities of GPTs. Despite substantial efforts over the past months, much of our work has been experimental in nature. Here are some insights from our "failures":
The issue mentioned above poses a serious challenge to the system expansion model we previously planned. To address this core issue, we have made numerous attempts but have yet to find a sufficiently good direction. Furthermore, expanding work along each direction requires long-term experimentation to find the best solution. Currently, there are two promising solutions:
RAG Integration Challenges: The mainstream RAG solutions we planned to integrate are flawed; they work for demos but fail to deliver stable, predictable results in regular scenarios. Our architectural vision for OpenDAN includes replacing the traditional FileSystem with a Knowledge Base, a critical infrastructure for the AI era. However, we have two diverging paths based on our experiences:
Tasklist/Todolist Systems Based on LLM's Planning Capabilities: Our experiments with autonomous task completion by LLMs have been largely unsuccessful without human intervention. We remain confident in our overall framework of Plan -> Decompose -> Execute -> Check -> Merge for OpenDAN Agent/Workflow, but at this moment, we must await further advancements in LLM capabilities (e.g., the release of GPT-5).
Revised Goals
Given these insights, I propose the following revised objectives for 0.5.2:
Moving Forward
Unless there are significant concerns, I will proceed with the new plan:
OpenDAN will continue to focus on the evolution of AI components, while the "Personal Server OS" part of Personal AIOS can be derived from my other project focused on this area, CYFS. We might consider merging these projects in the upcoming version 0.5.3.
Feel free to use or modify this template for your GitHub issue. Let me know if there are specific aspects you'd like to adjust or emphasize differently!