Add support for streaming TTS response from the /audio/speech endpoint

64bit / async-openai

Rust library for OpenAI

https://docs.rs/async-openai

MIT License

1.11k stars 165 forks source link

Add support for streaming TTS response from the /audio/speech endpoint #179

Closed bqm closed 8 months ago

bqm commented 8 months ago

Problem

OpenAI supports streaming for its /audio/speech endpoint but this functionality is not available in async-openai.

Solution

Add support for response streaming following the existing coding patterns from the codebase.

A couple of decisions were taken:

Add a post_raw_stream functionality in the client as using the existing SSE pattern would not work (the EventSource library decodes its data into a String and there is no byte mode from what I could tell).
Do not introduce new specific types for the input and output structures, reusing the equivalent non stream type versions. This follows the existing convention from what I could tell.
Introduce a simple example streaming to disk the audio output. Another example could have been to stream to the audio output but this would have needed the introduction of a playback library as well as a streaming audio decoder, which feels out of scope of this library.

Related to #177.

bqm commented 8 months ago

@64bit I have updated my code to reflect your comment about errors. I have also used that opportunity to use a retry setup similar to the existing. The code can be further abstracted if that is desirable. I tested both error (no API key) and success scenario on my example and both are working well.

I have reused the map_err code from execute_raw, I can duplicate it or abstract the code even further - not sure what the usual setup is for this repository.

64bit commented 8 months ago

Thanks for update.

Given that previous change was not well tested and required major refactor, that calls for testing all existing examples. Are they all still working?

Also based on new information the assumption no longer holds, official document do not match what you have observed.

Perhaps its best to wait until it officially makes it into OpenAPI spec.

bqm commented 8 months ago

Thank you for reviewing @64bit.

Perhaps its best to wait until it officially makes it into OpenAPI spec.

Ok let me know what stance you would like to have. I don't think the OpenAI spec will change soon but I am happy to report the discrepancy to them wherever is appropriate and wait if that is what you prefer.

I have a couple of potential exclusive suggestions: 1) Do not merge this change if you think it is best to wait until the OpenAI spec is updated. 2) Change this PR to minimize code changes, so essentially do not reuse the code, just duplicate the useful portions for now. This can give you piece of mind that if something is broken, it would just be the new code, not anything else. 3) Keep the PR as is and Do a lot of manual tests, both error path and happy path.

Let me know what is your preference - I am ok either way, happy to go with 1), 2) or 3). I do think streaming TTS is useful - people looking to use Rust are usually looking for performance imo and so this would make sense to have that feature at some point.

bqm commented 8 months ago

@64bit Do you have any preferences between 1), 2) or 3)?

64bit commented 8 months ago

Receiving a non-working PR / lack of testing has dropped my confidence in it.

I dont have a preference at the moment or bandwidth to review/test this thoroughly at the moment

bqm commented 8 months ago

Hello, I am sorry to hear that. I am not sure how I can contribute at this point - I followed your initial requests, responded to your feedback and I am proposing various ways to move forward. Maybe let's drop this PR for now then, I will close it.

64bit commented 8 months ago

Following up after your comment in the linked issue:

Left one comment, please let me know outcome of (3) and this is good to go.