Support text-to-speech (TTS) streaming functionality

bqm commented 8 months ago

Problem

The current speech(...) and transcribe(...) functions as part of the Audio implementation do not support a streaming mode.

This is particularly useful for any real-time application to simulate interactivity.

Proposal

Implement a speech_stream and transcribe_stream mimicking the create_stream functionality.

Is someone already working on this? If not, I can give it a go.

64bit commented 8 months ago

OpenAI doesn't seem to have native streaming APIs for Audio https://platform.openai.com/docs/api-reference/audio

Can you elaborate more on what you mean by streaming functionality for TTS & STT?

bqm commented 8 months ago

From what I can tell, this might be a documentation oversight? https://help.openai.com/en/articles/8555505-tts-api mentions:

Is it possible to stream audio? Yes! By setting stream=True, you can chunk the returned audio file.

And people reports that it is indeed working for them: https://github.com/openai/openai-python/issues/864. It also matches the iOS app conversational behavior if you tried that.

I did a couple of postman requests which gave me a "Transfer-Encoding: chunked" header in the response so that might work out of the box without any specific "stream" key set to true. The rust library would need to expose that if that is true.

I have not tried the transcriptions endpoint so cannot comment on that yet.

I can do some further research and share back unless you are tackling it.

64bit commented 8 months ago

Thank you for sharing additional information. The help article suggest that they have officially made it public and so I assume its safe to consider it as not an internal feature flag in their API.

Of course I'm not tackling it - as I learned it from your comment.

Given that, you're welcome to send a PR! stream suffix in method name does sound reasonable as its consistent with other streaming APIs.

In addition, having a working example for this would be very helpful for me to test and other folks to use.

64bit commented 8 months ago

Upstream spec was updated after I had released v0.18.0. So perhaps it may have this or may not, but worth a look.

bqm commented 8 months ago

Ok great, I should be able to push a PR tomorrow I think (for the speech_stream endpoint initially and can do a second PR for the transcribe endpoint after that?)

64bit commented 8 months ago

Sounds like a good plan to me, thank you for offering to contribute!

bqm commented 8 months ago

Just added a PR - I had a quick look for the STT use case - I don't think that streaming is actually supported for the transcription endpoint, looking at the OpenAI documentation and openai-python code.

I will drop it from the scope for now and reduce the scope to TTS streaming.

Boscop commented 8 months ago

Thanks for adding this, looking forward to using this :)

Do you know by any chance how to set stream to true when using the OpenAI API from TypeScript (either with the openai-node package or otherwise)? I don't see a stream param here: https://github.com/openai/openai-openapi/blob/f4a2833d00e92c4b1cb531d437da88a03de997d8/openapi.yaml#L6860-L6894 or here: https://github.com/openai/openai-node/blob/d67c11b40deee82110d8bef18931ebafbe58bf8a/src/resources/audio/speech.ts#L17-L47

bqm commented 8 months ago

@Boscop I am not familiar with openai-node but what I saw in the linked files you provided is consistent with what I observed: there is no stream parameter actually - the /audio/speech is always streamed from OpenAI no matter what.

There is actually an example of that behavior in openai-node in the examples folder: https://github.com/openai/openai-node/blob/master/examples/audio.ts#L19C16-L19C33 (I found that via https://github.com/openai/openai-node/issues/487).

bqm commented 8 months ago

Unclear on how to move forward at this point as feedback on pull request cannot be actioned. Marking this as won't fix for now - happy to restart the thread if the conditions change.

64bit commented 8 months ago

Thank you for your contributions.

I'll update contribution guidelines with minimum expectations including testing, documentation etc. for basic hygiene - it would fill the missing communication gap in the project.

Its easier to take if it compiles it works philosophy in Rust, but as we found in PR for this, its not always the case.

I'm sorry that you had a poor experience here, and I agree my last comment on PR was not actionable and I'm sorry about that.

If you wish to, you're very welcome to continue, to get your work shipped I gave it another review and left a comment. From the options that you have listed I think (3) most appropriate.

I hope you continue and I'd be happy to see your work get shipped. Thank you again for contributions!

64bit commented 8 months ago

Updated guidelines: https://github.com/64bit/async-openai#contributing

This issue falls outside the official docs API Reference and OpenAPI spec, and since you already worked on it before guidelines were in place you're welcome to get it shipped.

Please feel free to reach out if you have any concerns.

64bit / async-openai

Support text-to-speech (TTS) streaming functionality #177