awslabs / amazon-transcribe-streaming-sdk

The Amazon Transcribe Streaming SDK is an async Python SDK for converting audio into text via Amazon Transcribe.
Apache License 2.0
142 stars 37 forks source link

over 5min simple_file #26

Open kazuhitogo opened 3 years ago

kazuhitogo commented 3 years ago

I can't figure out how to maintain a session for more than 5 minutes using http2. doc describes how to specify expire-time for websocket, but not for http2. Is there any way to connect a session for more than 5 minutes with this SDK?

nateprewitt commented 3 years ago

Hi @kazuhitogo,

Could you clarify what you mean by "maintaining a session"? If the signatures are becoming invalid after 5 minutes I believe you're hitting the issue noted in the quickstart section:

 # NOTE: For pre-recorded files longer than 5 minutes, the sent audio
 # chunks should be rate limited to match the realtime bitrate of the
 # audio stream to avoid signing issues.

The Transcribe Streaming API is meant for realtime audio and processes at that rate. If you're streaming prerecorded audio that's longer than 5 minutes, it will need to be rate limited to being sent closer to real time. Otherwise, the payloads are signed in the client and then can't be processed by the service until after they expire.

bjnord commented 3 years ago

There's some code in one of the unit tests that shows how to rate-limit. I got it going as shown below (it pulls the whole WAV into memory which is only good for relatively short files). The only trouble is, the transcript results seem to get "stuck" every 15 seconds or so; they stop coming, and then resume after a bit, but they are missing a string of words in the meantime.

--- simple_file.py.orig 2021-04-28 09:45:32.000000000 -0500
+++ simple_file.py      2021-04-28 10:03:11.000000000 -0500
@@ -35,14 +35,18 @@
     )

     async def write_chunks():
-        # An example file can be found at tests/integration/assets/test.wav
-        # NOTE: For pre-recorded files longer than 5 minutes, the sent audio
-        # chunks should be rate limited to match the realtime bitrate of the
-        # audio stream to avoid signing issues.
-        async with aiofile.AIOFile('tests/integration/assets/test.wav', 'rb') as afp:
-            reader = aiofile.Reader(afp, chunk_size=1024 * 16)
-            async for chunk in reader:
-                await stream.input_stream.send_audio_event(audio_chunk=chunk)
+        with open('tests/integration/assets/5min-test.wav', 'rb') as f:
+            raw_bytes = f.read()
+        # This simulates reading bytes from some asynchronous source
+        # This could be coming from an async file, microphone, etc
+        async def byte_generator():
+            # 4000 bytes = 1/4 second of 8kHz, mono, 16-bit PCM
+            chunk_size = 4000
+            for i in range(0, len(raw_bytes), chunk_size):
+                yield raw_bytes[i : i + chunk_size]
+                await asyncio.sleep(0.25)
+        async for chunk in byte_generator():
+            await stream.input_stream.send_audio_event(audio_chunk=chunk)
         await stream.input_stream.end_stream()

     # Instantiate our handler and start processing events
kazuhitogo commented 3 years ago

@nateprewitt

Thanks for the response. I am sorry for the delay in confirming. Yes, I had checked that note and was wondering how to handle that. As far as the documentation is concerned, for WebSocket, the I wanted to know how to set the signature expiration date in the case of WebSocket with the parameter X-Amaz-Expires, but in the case of this Python SDK. https://docs.aws.amazon.com/transcribe/latest/dg/websocket.html

Am I correct in assuming that this is not possible?

@bjnord

Thank you. Is this code meant to keep the signature updated by reducing the chunk size and interrupting sleep every time?