TEN-framework / TEN-Agent

TEN Agent is a world-class multimodal AI agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG.
https://agent.theten.ai/
Apache License 2.0
1.76k stars 200 forks source link

Supports OpenAI's TTS and STT APIs #324

Open zhanghx0905 opened 1 month ago

zhanghx0905 commented 1 month ago

I'm wondering if the project currently supports OpenAI's TTS and STT APIs, or if there are any plans to integrate them.

plutoless commented 1 month ago

the one realtime api use? or a separate one.

zhanghx0905 commented 1 month ago

the one realtime api use? or a separate one.

Separate TTS and STT api,

plutoless commented 2 weeks ago

@zhanghx0905 openai's STT/TTS is not stream based, they can only process files. so they are not that ideal in realtime cases.

zhanghx0905 commented 1 week ago

@zhanghx0905 openai's STT/TTS is not stream based, they can only process files. so they are not that ideal in realtime cases.

You may take a look at the livekit-agent GitHub repository. I tried their OpenAI plugin and adapt it to Chinese. I found it works just like a streaming service.

By the way, I have locally deployed TTS (Text-to-Speech) / STT (Speech-to-Text) services. In order to integrate them into applications compatible with the OpenAI API, I wrapped them in the OpenAI API format. Therefore, I hope you will also consider these APIs.