Allow streaming audio responses

Is your feature request related to a problem? Please describe. Whenever I want to use text to speech as a chatbot response, I have to wait for the response from the LLM and then use a TTS service to synthesize the results which results in a huge latency and poor UX experience.

Describe the solution you'd like It would be really cool is cl.Audio/cl.Message allowed streaming of audio tokens so that we can stream TTS response as we get it, instead of waiting to synthesize completely.

Describe alternatives you've considered None at the moment.

Chainlit / chainlit

Allow streaming audio responses #1340