Closed danneu closed 1 year ago
I implemented an initial token streaming solution: https://github.com/danneu/telegram-chatgpt-bot/commit/aa6949865833dd7e9755979d26e82c2f550c3ca2
As the bot receives chat completion tokens, it repeatedly updates the same message so the user can read the answer as it is being created. This is a huge UX improvement over waiting for the whole answer to finish before seeing anything.
The voice memo is now sent after the text message is finished.
Right now, when text-to-speech voice memos are enabled, the user has to wait until the ChatGPT text response is round-tripped to Azure's text-to-speech API before they even get to read any text.
This adds about 2-6 seconds per response. Sometimes the Azure API even inexplicably hangs for 10+ seconds before finally responding (TODO: Race it with 8 second timeout or something).
While I like the idea of responding with a voice memo that chains into the text response, I don't want to hang the UX for 2-6 seconds.
Ideas:
sendMessage
) and a voice message (sendVoice
) are two different concepts in Telegram.