Amazon Transcribe (Streaming Speech To Text)

aws-amplify / amplify-swift

A declarative library for application development using cloud services.

Apache License 2.0

447 stars 193 forks source link

Amazon Transcribe (Streaming Speech To Text) #3180

Open msidd opened 1 year ago

msidd commented 1 year ago

Is your feature request related to a problem? Please describe.

I am frustrated with the amplify documentation located here https://docs.amplify.aws/lib/predictions/transcribe/q/platform/ios/

It does not explain if streaming is available in swift 2.0

Describe the solution you'd like

clear documentation on how to use streaming with amplify in swif2.0 if the feature is available, if not what are the options

Describe alternatives you've considered

have not considered any alternative

Is the feature request related to any of the existing Amplify categories?

Predictions

Additional context

No response

atierian commented 1 year ago

Thanks for opening this @msidd

The documentation you linked is for Amplify Swift v2. The Predictions category was made available in 2.11.6, so you can use any version after that. We recommend using the most recent version (currently 2.15.4).

Does that answer your question? Can you elaborate on what you'd like to see in the documentation?

msidd commented 1 year ago

as i understand, the documentation has an example for batch transcribing (url of the file) and does not have an example on to use https://docs.aws.amazon.com/transcribe/latest/APIReference/API_streaming_StartMedicalStreamTranscription.html API

atierian commented 1 year ago

*Note: Amplify doesn't support StartMedicalStreamTranscription as linked in your previous comment. The following response assumes this is StartStreamTranscription.

Yes, that's correct.

speechToText(url: URL) uses the StartStreamTranscription WebSocket based API under the hood. This provides the transcription back from Amazon Transcribe in chunks as they're processed, however Amplify doesn't currently support streaming the input from the client side. To do so would require adding a new API to Amplify, something like speechToText(stream: AsyncStream<Data>).

Can you provide details about your use case? That will help us with this feature request. Thanks!

github-actions[bot] commented 1 year ago

This has been identified as a feature request. If this feature is important to you, we strongly encourage you to give a 👍 reaction on the request. This helps us prioritize new features most important to you. Thank you!

msidd commented 1 year ago

The use case is that we need to do inline dictation but we cannot use Siri on the device due to HIPAA regulations. Therefore, latency is critical and we found that server side ASR is too slow. Instead direct ASR connection from device to ASR is best.

Further, when doing client-side transcription, the network may be flaky as the user moves around, so we need the SDK to give us feedback on what portion of the audio was transcribed, so we can resend / retry the parts that were not.