Closed bqm closed 8 months ago
OpenAI doesn't seem to have native streaming APIs for Audio https://platform.openai.com/docs/api-reference/audio
Can you elaborate more on what you mean by streaming functionality for TTS & STT?
From what I can tell, this might be a documentation oversight? https://help.openai.com/en/articles/8555505-tts-api mentions:
Is it possible to stream audio? Yes! By setting stream=True, you can chunk the returned audio file.
And people reports that it is indeed working for them: https://github.com/openai/openai-python/issues/864. It also matches the iOS app conversational behavior if you tried that.
I did a couple of postman requests which gave me a "Transfer-Encoding: chunked" header in the response so that might work out of the box without any specific "stream" key set to true. The rust library would need to expose that if that is true.
I have not tried the transcriptions endpoint so cannot comment on that yet.
I can do some further research and share back unless you are tackling it.
Thank you for sharing additional information. The help article suggest that they have officially made it public and so I assume its safe to consider it as not an internal feature flag in their API.
Of course I'm not tackling it - as I learned it from your comment.
Given that, you're welcome to send a PR! stream
suffix in method name does sound reasonable as its consistent with other streaming APIs.
In addition, having a working example for this would be very helpful for me to test and other folks to use.
Upstream spec was updated after I had released v0.18.0. So perhaps it may have this or may not, but worth a look.
Ok great, I should be able to push a PR tomorrow I think (for the speech_stream endpoint initially and can do a second PR for the transcribe endpoint after that?)
Sounds like a good plan to me, thank you for offering to contribute!
Just added a PR - I had a quick look for the STT use case - I don't think that streaming is actually supported for the transcription endpoint, looking at the OpenAI documentation and openai-python code.
I will drop it from the scope for now and reduce the scope to TTS streaming.
Thanks for adding this, looking forward to using this :)
Do you know by any chance how to set stream
to true
when using the OpenAI API from TypeScript (either with the openai-node
package or otherwise)?
I don't see a stream
param here:
https://github.com/openai/openai-openapi/blob/f4a2833d00e92c4b1cb531d437da88a03de997d8/openapi.yaml#L6860-L6894
or here:
https://github.com/openai/openai-node/blob/d67c11b40deee82110d8bef18931ebafbe58bf8a/src/resources/audio/speech.ts#L17-L47
@Boscop I am not familiar with openai-node
but what I saw in the linked files you provided is consistent with what I observed: there is no stream
parameter actually - the /audio/speech is always streamed from OpenAI no matter what.
There is actually an example of that behavior in openai-node
in the examples folder: https://github.com/openai/openai-node/blob/master/examples/audio.ts#L19C16-L19C33 (I found that via https://github.com/openai/openai-node/issues/487).
Unclear on how to move forward at this point as feedback on pull request cannot be actioned. Marking this as won't fix for now - happy to restart the thread if the conditions change.
Thank you for your contributions.
I'll update contribution guidelines with minimum expectations including testing, documentation etc. for basic hygiene - it would fill the missing communication gap in the project.
Its easier to take if it compiles it works philosophy in Rust, but as we found in PR for this, its not always the case.
I'm sorry that you had a poor experience here, and I agree my last comment on PR was not actionable and I'm sorry about that.
If you wish to, you're very welcome to continue, to get your work shipped I gave it another review and left a comment. From the options that you have listed I think (3) most appropriate.
I hope you continue and I'd be happy to see your work get shipped. Thank you again for contributions!
Updated guidelines: https://github.com/64bit/async-openai#contributing
This issue falls outside the official docs API Reference and OpenAPI spec, and since you already worked on it before guidelines were in place you're welcome to get it shipped.
Please feel free to reach out if you have any concerns.
Problem
The current
speech(...)
andtranscribe(...)
functions as part of theAudio
implementation do not support a streaming mode.This is particularly useful for any real-time application to simulate interactivity.
Proposal
Implement a
speech_stream
andtranscribe_stream
mimicking thecreate_stream
functionality.Is someone already working on this? If not, I can give it a go.