Integrate GPT 4o without TTS/STT

clemlesne / call-center-ai

Send a phone call from AI agent, in an API call. Or, directly call the bot from the configured phone number!

Apache License 2.0

115 stars 31 forks source link

Integrate GPT 4o without TTS/STT #210

Open clemlesne opened 1 month ago

clemlesne commented 1 month ago

OpenAI GPT 4o model supports both in and out of text, image and audio. Understanding is finer than usual STT > model > TTS approach because the model has direct access to user behavior, emotions, etc.

Is there a way to use Communication Services and receive the raw audio flow, bypassing the STT step?

Qwatro55 commented 1 month ago

I'm also interested in this question.

agentverket commented 1 month ago

What about response time? What about costs? Can you stream data?

clemlesne commented 3 weeks ago

I know I know :) OpenAI APIs are not yet available:

https://community.openai.com/t/what-will-the-gpt-4o-audio-api-look-like/754242
https://community.openai.com/t/gpt-4o-chat-completion-with-audio-response/752149/6

Plus, Communication Services APIs are not yet available to use with raw audio stream.

If you have ideas, don't hesitate!