awslabs / amazon-transcribe-streaming-sdk

The Amazon Transcribe Streaming SDK is an async Python SDK for converting audio into text via Amazon Transcribe.
Apache License 2.0
142 stars 37 forks source link

wave Audio file format limitation #10

Open amgautam opened 3 years ago

amgautam commented 3 years ago

while testing the code with my own audio wave file, I realized that the transcription fails for any audio file with comprising of 2 channels.

eg.

failing file format- sampling rate: 16000 Hz length: 64000 samples channels: 2 sample width: 2 bytes 256000

same file succeeds with channel as one for same sample rate: sampling rate: 16000 Hz length: 64000 samples channels: 1 sample width: 2 bytes 128000

Would like to understand, why it happens so and will this be a limitation on what AUDIO files can be transcribed successfully?

nateprewitt commented 3 years ago

Hi @amgautam,

Amazon Transcribe just released support for multi-channel audio streaming this month. We haven't integrated those changes to the API into this library yet though. If you need immediate support the AWS Ruby and Go SDKs should have this functionality released. We'll look at trying to get the recent API change into a future release of this SDK to add support.

Thanks for the question!

amgautam commented 3 years ago

@nateprewitt Thanks for the response. I have a follow-up question here. When I tested the same audio file with 2 channels for static transcription, it worked just fine. Does the normal (non- streaming or static) transcription API converts the audio file from multichannel to singe channel internally before doing the transcription?

Eg. API call

transcribe = boto3.client('transcribe') response = transcribe.start_transcription_job( TranscriptionJobName='demo' , LanguageCode = "en-US", MediaFormat = "wav", Media = { "MediaFileUri" : mediaUri }, --- s3 file path )

nateprewitt commented 3 years ago

Hi @amgautam,

It's likely that the Transcribe service is doing the channel split behind the scenes. Looking the documentation for the static API: "By default, you can transcribe audio files with two channels." I believe that's why you're seeing success with boto3.

We've just released a new version of this SDK (0.2.0) which should now have full support for multi-channel audio files. Whenever you have a chance, feel free to try it out and see if this brings our behavior in line with what you'd expect.

joguSD commented 3 years ago

@amgautam We've released an update to the library that pulls in some of the new API parameters. It seems like there's an option for number of channels but requires explicit configuration in the API call.

amgautam commented 3 years ago

thanks @nateprewitt and @joguSD For my experiment I converted the file from multiple channel to Mono Audio using Python. But definitely the new API is worth looking into.