awslabs / amazon-transcribe-streaming-sdk

The Amazon Transcribe Streaming SDK is an async Python SDK for converting audio into text via Amazon Transcribe.
Apache License 2.0
151 stars 38 forks source link

AwsCrtHttpSessionManager does not validate cached connections before returning them, and keeps them indefinitely #76

Closed gscalise closed 2 years ago

gscalise commented 2 years ago

AwsCrtHttpSessionManager maintains a cache of sessions in the self._connections field:

https://github.com/awslabs/amazon-transcribe-streaming-sdk/blob/95349afd317b83b06b0c3dadc3a51720595bf876/amazon_transcribe/httpsession.py#L116-L122

Using elements of the parsed url as a key when calling get_connection:

https://github.com/awslabs/amazon-transcribe-streaming-sdk/blob/95349afd317b83b06b0c3dadc3a51720595bf876/amazon_transcribe/httpsession.py#L158-L162

But it does not validate the connection before returning it:

https://github.com/awslabs/amazon-transcribe-streaming-sdk/blob/95349afd317b83b06b0c3dadc3a51720595bf876/amazon_transcribe/httpsession.py#L163-L164

Which can lead to issues if the connection is dropped or closed, since it will never be evicted from the cache and will continue being returned whenever the same URL is used.

Since the AWS CRT does have an is_open() method for awscrt.http.HttpConnectionBase objects (that awscrt.http.HttpClientConnection extends), it would make sense to validate the connection before returning it, and proceed to evict it if it's not alive.

There's also a shutdown_future property in awscrt.http.HttpConnectionBase, which is a future that completes when the connection shuts down. To avoid retaining closed sessions in the cache, it would be good to also hook a done callback into this future and to eagerly evict the stale connection from the cache instead of keeping it there indefinitely.

nateprewitt commented 2 years ago

Resolved with #79.