awslabs / amazon-transcribe-streaming-sdk

The Amazon Transcribe Streaming SDK is an async Python SDK for converting audio into text via Amazon Transcribe.
Apache License 2.0
153 stars 41 forks source link

TranscribeStreamingClient creates a new AwsCrtHttpSessionManager on each call to start_stream_transcription #77

Closed gscalise closed 2 years ago

gscalise commented 2 years ago

Each time a call to start_stream_transcription is made on a TranscribeStreamingClient, a new AwsCrtHttpSessionManager is created:

https://github.com/awslabs/amazon-transcribe-streaming-sdk/blob/95349afd317b83b06b0c3dadc3a51720595bf876/amazon_transcribe/client.py#L173

This session manager then opens a new HTTP/2 connection even if there are other streaming transcription requests already in flight. Coupled with the issue #76 about connections never being evicted from AwsCrtHttpSessionManager's cache this can lead to file descriptor exhaustion in long-running processes (ie service workers) due to the socket FDs never being released.

I would also note that HTTP/2 connections are meant to be multiplexed, and opening multiple connections to the same endpoint defeats the purpose of having HTTP/2 in first place.

nateprewitt commented 2 years ago

Resolved with #80.