Open waterflier opened 1 year ago
Voice2txt, base on OpenAI I will implement this
Falcon2, MPT, Vicuna, They are already supported by llama.cpp. I can conduct compatibility experiments on them and resolve related issues.
Claude2, From the relevant information, we learned that it is not open source and has restricted access to some areas. There are unknown risks in terms of fees and application APIs.
Code Interpeter I will implement this
Web Search I will implement this
Query SQL DB I will implement this
Agent Message MIME Support (Image,Video,Audio) doc Parser Image Parser (Base on GPT-V) Video Parser (Base on GPT-V)
I will implement this
Discord Tunnel I will implement this
Dear Team,
Given the recent launch of OpenAI's new version in early November 2023, many of us may have felt a profound shift in the industry. As the world changes, I believe we should adapt accordingly. Here are some of my thoughts:
Affirmation of Our Path: OpenAI's latest release, particularly the functionalities of the so-called GPTs Agent platform, is largely similar to our 0.5.1 version released on September 28th. This strongly affirms the correctness of our direction. OpenAI has done a great job educating the market about the Agent, so we no longer need to emphasize the correct use of LLM based on the Agent through version releases. This part of user education product design can be simplified.
Innovation in Version 0.5.2: For our new version (0.5.2), besides maintaining the combinational advantages brought by private deployment of LLM, I believe we need to implement some of the innovative ideas we've discussed about the Agent. This is crucial to maintaining our leading position and avoiding the impression that OpenDAN is merely a follower of GPTs.
Integration of OpenAI's New Capabilities: We should fully integrate the new capabilities brought by OpenAI's latest release, especially the longer Token Windows, GPT-V, and Code-interpreter. I believe these new features can effectively solve some known issues.
Therefore, I propose to adjust the goals and plans for version 0.5.2. Here are the core objectives:
The detailed version plan is as follows:
MVP plan adjustment
In order to keep the list below too long, the system distributed version is 0.5.3, I think we will open another ISSUE discussion and record, this list does not include.
The modules that are not specially explained are components completed in the 0.5.2 plan
Some Explanation
Upgrade Agent Working Cycle
The goal is to transform the Agent from a passive message-handling Assistant to an actively acting Agent based on roles. The concept of the relevant modules mainly involves the Agent's behavior patterns (4 types), the Agent's capabilities, and the Agent's memory management (learning and introspection).
For a detailed introduction, refer here: https://github.com/fiatrete/OpenDAN-Personal-AI-OS/issues/91
Workspace Environment
The Workspace supports the implementation of the Agent Working Cycle design. Its core abstraction is defined as: saving the shared state needed for Agent collaboration and providing the basic capabilities for Agents to complete their work. I carefully referenced AutoGPT in the design. The difference between Workspace and AutoGPT is the emphasis on collaboration (Agent with Agent, Agent with humans). After contemplation, the Workspace primarily consists of the following components:
Each Agent has its own private Workspace, not shared with others. I hope to achieve diversity through the combination of "Agent and Workflow Role". Each user "trains" different Agents through their usage habits, and then these Agents collaborate to complete complex tasks defined in the Workflow. The final results of these complex tasks can reflect the user's inherent personality and preferences.
This component design also reflects my thoughts on the key question, "What capabilities should we endow an Agent with, and how do we control the security boundaries when it transitions from a consultant to a steward?" It's not a simple question, so I anticipate this component will continue to iterate in the future.
Agent Message MIME Support
Agent Message MIME Support means that Agents can handle multiple types of messages, including images, videos, audio, files, etc. For most Agents, this requires adding a customizable standard step of parsing messages in the message handling process. The input of this step is the message's MIME type, and the output is the text content of the message. This step can be implemented by calling the text_parser module.
Another core requirement of MIME support is to use a unified method to save these non-text content data.
Text base Knowledge Base
In 0.5.1, we mainly implemented RAG based on the popular Embedding + vector database solution. Through practice, we found that this solution did not fully utilize the potential of LLM, so I want to introduce two new modes to further enhance RAG:
Text Parser Support
Both MIME Support and Text-based Knowledge Base require the system to support converting various document formats into text that can express semantics as much as possible. This component, known as TextParser, should be implemented as an open and extensible framework, given the vast amount of digital content that exists in different formats.
Local Text Search
Using traditional inverted index technology to save all document content locally and provide rapid local search capabilities. The implementation of this component can refer to ElasticSearch.
Text Summary
Using the capabilities of LLM to learn all the documents and then save the learning results locally. This behavior can be considered "Self-Learn". Users can let Agents responsible for organizing materials use different prompts according to the purpose of organizing the materials to obtain more targeted results.
Stable Diffusion Controler Agent
Practice the concept of "Agent as a new era method of using computing", replacing the complex Stable Diffusion WebUI with an easy-to-use Agent. Help users complete complex AIGC tasks and build a paradigm. This paradigm can cover the entire process of AIGC: LORA training, use, model downloading, plugin downloading, generation of prompt words, selection of AIGC results.
Email Agent/CEO assistant
The integrated test product of 0.5.2, aimed at private deployment for small and medium-sized enterprises, is a CEO Assistant that can read all company emails and materials. I am writing a detailed product document, which is not elaborated here.
I look forward to hearing your thoughts on these proposed adjustments.