eidolon-ai / eidolon

The first AI Agent Server, Eidolon is a pluggable Agent SDK and enterprise ready, deployment server for Agentic applications
https://www.eidolonai.com/
Apache License 2.0
285 stars 31 forks source link

[Question / Support] How to integrate a speech/tts api like openedai speech? #848

Closed atljoseph closed 1 week ago

atljoseph commented 1 month ago

How do I integrate this as aText to speech server? This repo is great at serving the speech OpenAI endpoint independently: https://github.com/matatonic/openedai-speech

I have this running locally and it is really good.

atljoseph commented 1 month ago

To this same line of thought, another question:

LukeLalor commented 1 month ago

hey @atljoseph , sorry for missing this for a few days.

we do have an agent that handles stt and tts. You can find it's documentation here

It currently only supports OpenAI as a Speech Unit. We use this agent to power our 🎤 button on our chatbot, but since we build speech capabilities into the APU, the SimpleAgent will be able to parse and generate speech files, so depending on your use case it might be better to use the multi-media functionality directly. What is your goal?

We do not have an OpenAI completions compatible endpoint. Since our agents have state (similar to the assistants api) , it can't plug in directly with tools expecting a stateless api.

Most chat bot frameworks have different ways of integrating external tools, but the most common pattern we expect is that the other framework will allow you to define a custom tool, and then that tool can call into an Eidolon agent and format the response.

So we can understand better, what framework(s) are you thinking about integrating with?

LukeLalor commented 1 month ago

(adding @flynntsang to this conversation since she will likely be interested in understanding the features / capabilities you are looking for)

atljoseph commented 1 month ago

Not considering any frameworks much at the moment. Mainly considering getting a simple proof working.

The voice part can be handled by OpenAI or by an OpenAI compatible endpoint. Might not need to be tied to a chat model directly, if an endpoint can just serve it up. I understand you may not have that granular style of functionality yet.

The speech part can be handled by openai or openai compatible endpoint. Same as above. Flexibility is highest of mind there. And running everything inhouse without OpenAI.

The serving a OpenAI compatible endpoint for a given processid… would be cool to access it with any chatbot or client. Memgpt has an endpoint t they also serve like that. Of course neither can be exactly like OpenAI, that’s why I figured a processid could host an endpoint, essentially.

On Wed, Oct 16, 2024 at 3:19 PM Luke Lalor @.***> wrote:

(adding @flynntsang https://github.com/flynntsang to this conversation since she will likely be interested in understanding the features / capabilities you are looking for)

— Reply to this email directly, view it on GitHub https://github.com/eidolon-ai/eidolon/issues/848#issuecomment-2417746817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF632FIWPD46IW3RVGI2K2LZ3232RAVCNFSM6AAAAABP27GJYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJXG42DMOBRG4 . You are receiving this because you were mentioned.Message ID: @.***>

atljoseph commented 1 month ago

Here’s another project that has something similar. They took the approach of providing a generic endpoint supplemented with extra params specific to openwebui. They had a good term for it. Unified LLM Provider. Basically, minimal vendor lock and extensibility. Thank you! https://docs.openwebui.com/api/

On Wed, Oct 16, 2024 at 3:34 PM Joseph Gill @.***> wrote:

Not considering any frameworks much at the moment. Mainly considering getting a simple proof working.

The voice part can be handled by OpenAI or by an OpenAI compatible endpoint. Might not need to be tied to a chat model directly, if an endpoint can just serve it up. I understand you may not have that granular style of functionality yet.

The speech part can be handled by openai or openai compatible endpoint. Same as above. Flexibility is highest of mind there. And running everything inhouse without OpenAI.

The serving a OpenAI compatible endpoint for a given processid… would be cool to access it with any chatbot or client. Memgpt has an endpoint t they also serve like that. Of course neither can be exactly like OpenAI, that’s why I figured a processid could host an endpoint, essentially.

On Wed, Oct 16, 2024 at 3:19 PM Luke Lalor @.***> wrote:

(adding @flynntsang https://github.com/flynntsang to this conversation since she will likely be interested in understanding the features / capabilities you are looking for)

— Reply to this email directly, view it on GitHub https://github.com/eidolon-ai/eidolon/issues/848#issuecomment-2417746817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF632FIWPD46IW3RVGI2K2LZ3232RAVCNFSM6AAAAABP27GJYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJXG42DMOBRG4 . You are receiving this because you were mentioned.Message ID: @.***>

LukeLalor commented 1 month ago

That makes sense as a goal, especially for ui centric projects, but I don't know how we could do something like that with eidolon since it maintains your project's state internally.

On a related note, we have been talking with some folks who are putting together a foundation to put out an open standard for agents. We definitely don't get any benifit from having a custom api, and would love to hop on the industry standard when one forms.

I'm going swap this issue over to @flynntsang , she is our head of product and I think she will be pretty interested in this conversation.