Closed atljoseph closed 1 week ago
To this same line of thought, another question:
hey @atljoseph , sorry for missing this for a few days.
we do have an agent that handles stt and tts. You can find it's documentation here
It currently only supports OpenAI as a Speech Unit. We use this agent to power our 🎤 button on our chatbot, but since we build speech capabilities into the APU, the SimpleAgent will be able to parse and generate speech files, so depending on your use case it might be better to use the multi-media functionality directly. What is your goal?
We do not have an OpenAI completions compatible endpoint. Since our agents have state (similar to the assistants api) , it can't plug in directly with tools expecting a stateless api.
Most chat bot frameworks have different ways of integrating external tools, but the most common pattern we expect is that the other framework will allow you to define a custom tool, and then that tool can call into an Eidolon agent and format the response.
So we can understand better, what framework(s) are you thinking about integrating with?
(adding @flynntsang to this conversation since she will likely be interested in understanding the features / capabilities you are looking for)
Not considering any frameworks much at the moment. Mainly considering getting a simple proof working.
The voice part can be handled by OpenAI or by an OpenAI compatible endpoint. Might not need to be tied to a chat model directly, if an endpoint can just serve it up. I understand you may not have that granular style of functionality yet.
The speech part can be handled by openai or openai compatible endpoint. Same as above. Flexibility is highest of mind there. And running everything inhouse without OpenAI.
The serving a OpenAI compatible endpoint for a given processid… would be cool to access it with any chatbot or client. Memgpt has an endpoint t they also serve like that. Of course neither can be exactly like OpenAI, that’s why I figured a processid could host an endpoint, essentially.
On Wed, Oct 16, 2024 at 3:19 PM Luke Lalor @.***> wrote:
(adding @flynntsang https://github.com/flynntsang to this conversation since she will likely be interested in understanding the features / capabilities you are looking for)
— Reply to this email directly, view it on GitHub https://github.com/eidolon-ai/eidolon/issues/848#issuecomment-2417746817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF632FIWPD46IW3RVGI2K2LZ3232RAVCNFSM6AAAAABP27GJYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJXG42DMOBRG4 . You are receiving this because you were mentioned.Message ID: @.***>
Here’s another project that has something similar. They took the approach of providing a generic endpoint supplemented with extra params specific to openwebui. They had a good term for it. Unified LLM Provider. Basically, minimal vendor lock and extensibility. Thank you! https://docs.openwebui.com/api/
On Wed, Oct 16, 2024 at 3:34 PM Joseph Gill @.***> wrote:
Not considering any frameworks much at the moment. Mainly considering getting a simple proof working.
The voice part can be handled by OpenAI or by an OpenAI compatible endpoint. Might not need to be tied to a chat model directly, if an endpoint can just serve it up. I understand you may not have that granular style of functionality yet.
The speech part can be handled by openai or openai compatible endpoint. Same as above. Flexibility is highest of mind there. And running everything inhouse without OpenAI.
The serving a OpenAI compatible endpoint for a given processid… would be cool to access it with any chatbot or client. Memgpt has an endpoint t they also serve like that. Of course neither can be exactly like OpenAI, that’s why I figured a processid could host an endpoint, essentially.
On Wed, Oct 16, 2024 at 3:19 PM Luke Lalor @.***> wrote:
(adding @flynntsang https://github.com/flynntsang to this conversation since she will likely be interested in understanding the features / capabilities you are looking for)
— Reply to this email directly, view it on GitHub https://github.com/eidolon-ai/eidolon/issues/848#issuecomment-2417746817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF632FIWPD46IW3RVGI2K2LZ3232RAVCNFSM6AAAAABP27GJYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJXG42DMOBRG4 . You are receiving this because you were mentioned.Message ID: @.***>
That makes sense as a goal, especially for ui centric projects, but I don't know how we could do something like that with eidolon since it maintains your project's state internally.
On a related note, we have been talking with some folks who are putting together a foundation to put out an open standard for agents. We definitely don't get any benifit from having a custom api, and would love to hop on the industry standard when one forms.
I'm going swap this issue over to @flynntsang , she is our head of product and I think she will be pretty interested in this conversation.
How do I integrate this as aText to speech server? This repo is great at serving the speech OpenAI endpoint independently: https://github.com/matatonic/openedai-speech
I have this running locally and it is really good.