danneu / telegram-chatgpt-bot

a Telegram ChatGPT bot that supports text prompts and two-way voice memos
33 stars 8 forks source link

Rethink text-to-speech (voice memo) ordering #1

Closed danneu closed 1 year ago

danneu commented 1 year ago

Right now, when text-to-speech voice memos are enabled, the user has to wait until the ChatGPT text response is round-tripped to Azure's text-to-speech API before they even get to read any text.

This adds about 2-6 seconds per response. Sometimes the Azure API even inexplicably hangs for 10+ seconds before finally responding (TODO: Race it with 8 second timeout or something).

While I like the idea of responding with a voice memo that chains into the text response, I don't want to hang the UX for 2-6 seconds.

Ideas:

danneu commented 1 year ago

I implemented an initial token streaming solution: https://github.com/danneu/telegram-chatgpt-bot/commit/aa6949865833dd7e9755979d26e82c2f550c3ca2

As the bot receives chat completion tokens, it repeatedly updates the same message so the user can read the answer as it is being created. This is a huge UX improvement over waiting for the whole answer to finish before seeing anything.

The voice memo is now sent after the text message is finished.