Feature suggestions - Githubissues

pi-infected commented 1 year ago

Hi,

Great project. It is exactly what the Autonomous Agent space is lacking, to get rid of the dependency to OpenAI or other commercial AI providers. Based on my own research (I wanted to build something like this before knowing your project), I can suggest new features that I guess are aligned with the project objectives :

Open-source local text2speech (great quality) : tortoise-tts (https://github.com/neonbjb/tortoise-tts, and an inference-optimized version https://github.com/152334H/tortoise-tts-fast)
Open-source local speech2text (optimized) : https://github.com/sanchit-gandhi/whisper-jax
Quasi open-source local text2img, img2img (with a REST API in a docker) : Automatic1111 with https://github.com/AbdBarho/stable-diffusion-webui-docker
Open source local and optimized vector database for the Memory : Milvus standalone (docker). There will be probably 3 types of Memory, the short-term, long-term and factual memory. The short term is the last back and forth between the user and the Agent, the long-term is the consolidated and summarized conversations and the factual is any external source recorded in the DB as facts (wikipedia pages, ARXIV articles, etc.). For the short and long term memory there is the REMO project albeit not currently using a real vector DB, (https://github.com/daveshap/REMO_Framework) and a langchain(ed) version (https://github.com/hunter-meloche/REMO-langflow) and for the factual memory it is basically Open-domain Q&A (https://www.deepset.ai/blog/understanding-semantic-search) on stored documents.
Open source Haystack for memory semantic search, memory Q&A and sentence / documents embeddings.

Best regards,

Maysunfalls commented 1 year ago

For TTS, Bark is also a good open source alternative https://github.com/suno-ai/bark

Josh-XT commented 1 year ago

Hello, thanks for all of the suggestions!

We do already have a pretty flexible memory using a vector database called ChromaDB and we store the outputs in a yaml file as well. Previous iterations of Agent-LLM actually supported 5 different vector databases until I found ChromaDB and replaced all of them with it since it requires no additonal setup or containers. I spent a great deal of time in the beginning of this project trying to understand vector databases and adding a bunch of them as options until finding Chroma. Have you found Chroma to be insufficient for memory?

I'll certainly take a look into all of those others! I know we have some TTS and STT stuff in there that isn't super well implemented yet (and not really documented at all), just in way of commands for the AI to use. Are there specific features that you would like to see come of each of those? Take a look at the commands directory to see what we currently have available if you haven't had a chance.

Thanks again! It is nearly impossible to keep up with what the best things are to use out there each day, it is constantly changing. I'm always open to suggestions.

alexl83 commented 1 year ago

If I may piggyback, I'd suggest to switch to llama-cpp-python (faster inference) and add prompts for models other than Vicuna, specifically Open Asisstant and Koala, perhaps Alpaca Cleaned

Thank you :)

Josh-XT commented 1 year ago

I can try that one again soon. I did have it set on that one before but it had issues which is why I ended up switching to the current one. The current one seems to work well except for Macs and because of that, I do plan to find a way to make it work for everyone possible. I'll give that one another look, all of this open source stuff is improving daily. Something that didn't work 2 days ago can be amazing today. :)

alexl83 commented 1 year ago

I can try that one again soon. I did have it set on that one before but it had issues which is why I ended up switching to the current one. The current one seems to work well except for Macs and because of that, I do plan to find a way to make it work for everyone possible. I'll give that one another look, all of this open source stuff is improving daily. Something that didn't work 2 days ago can be amazing today. :)

Afaik, babyagi uses llama-cpp-python: it seems quite robust at a first glance

pi-infected commented 1 year ago

Hello, thanks for all of the suggestions!

We do already have a pretty flexible memory using a vector database called ChromaDB [...]

Great ! For now ChromaDB is perfect. I have about ~2.5To of local wikipedia and ARXIV articles that I wanted to use as fact database, so the vector db would need to scale well. but for conversations ChromaDB is fine. Maybe if you want to go that route one day, it could be as simple as letting the LLM suggests the fact-checking by it-self like :

" You have a wikipedia and a scientific database that you can use get facts to help you improve your answers for the following user request :

USER REQUEST |insert user request here|

You can search those databases, if needed, by outputting a list of the questions you would like to ask to those databases in the following format :

COMMAND FORMAT search_wikipedia("your-question") or search_scientific("your-question")

COMMAND EXAMPLES search_wikipedia("What is the height of the Eiffel Tower?") or search_scientific("What is a Latent Diffusion Model?")

Only return the command list. Each command has a cost so output "NO COMMAND" if nothing is really needed. "

And the result would be added to the context of for the LLM answer to the user's request. The search would be Q&A on the Fact Memory.

I'll certainly take a look into all of those others! I know we have some TTS and STT stuff in there that isn't super well implemented yet (and not really documented at all), just in way of commands for the AI to use.

Ok I'll test the commands ;)

Thanks again! It is nearly impossible to keep up with what the best things are to use out there each day, it is constantly changing. I'm always open to suggestions.

My pleasure, yeah there is so much information to ingest it makes me dizzy too. If you wanted to integrate interaction with Stable Diffusion models, the automatic1111 soft has a REST API so it should not be too difficult with the link I gave you.

@Maysunfalls I have looked at Bark-TTS but is not currently ready to use IMO. It is not possible to have audio clips longer than ~13sec and the audio quality is bad as they are sampled at 6 or 8khz. Upsampling seems really difficult (I've tested some models, the results are almost ok but not great). They would need to train a model with a better sampling rate.

Josh-XT / AGiXT

Feature suggestions #64