deepgram / deepgram-python-sdk

Official Python SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
178 stars 48 forks source link

httpcore.ReadTimeout: The read operation timed out #423

Closed osamabinsaleem closed 4 days ago

osamabinsaleem commented 4 days ago

Hi. I'm getting frequent read time out errors. This is my code:

transcription_model = "nova-2-general"
# sometimes we also use "whisper-medium"

#download the video from s3
# convert the video to mp3 audio

with open(audio_efs_path, "rb") as file:
            buffer_data = file.read()

payload: FileSource = {
        "buffer": buffer_data,
}

options = PrerecordedOptions(
                    model=transcription_model,
                    smart_format=True,
                    detect_language=True,
                )
response = deepgram.listen.prerecorded.v("1").transcribe_file(
            payload, options, timeout=httpx.Timeout(300.0, connect=10.0)
        )

I've my video files stored on s3 buckets. I first download the file and then I'm converting it to mp3. Then I use the above code to transcibe it. I still occasionally get timeout issues. We get timeout issues with nova-2-general, so model changing doesnt work.

What do you recommend here: 1- Re-trying with Exponentional backoff 2- I should upload the audio to s3 bucket and use the presigned url to get the transcipt

Thanks!

There are the error logs:


2024-06-24T13:12:34.175Z | [INFO] 2024-06-24T13:12:34.175Z 619e77ea-ea1b-4947-9b55-248653553619 Traceback (most recent call last):
-- | --
  | 2024-06-24T13:12:34.175Z | File "/var/lang/lib/python3.10/site-packages/httpx/_transports/default.py", line 69, in map_httpcore_exceptions
  | 2024-06-24T13:12:34.175Z | yield
  | 2024-06-24T13:12:34.175Z | File "/var/lang/lib/python3.10/site-packages/httpx/_transports/default.py", line 233, in handle_request
  | 2024-06-24T13:12:34.175Z | resp = self._pool.handle_request(req)
  | 2024-06-24T13:12:34.175Z | File "/var/lang/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 216, in handle_request
  | 2024-06-24T13:12:34.175Z | raise exc from None
  | 2024-06-24T13:12:34.175Z | File "/var/lang/lib/python3.10/site-packages/httpcore/_sync/connection_pool.py", line 196, in handle_request
  | 2024-06-24T13:12:34.175Z | response = connection.handle_request(
  | 2024-06-24T13:12:34.175Z | File "/var/lang/lib/python3.10/site-packages/httpcore/_sync/connection.py", line 101, in handle_request
  | 2024-06-24T13:12:34.175Z | return self._connection.handle_request(request)
  | 2024-06-24T13:12:34.175Z | File "/var/lang/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 143, in handle_request
  | 2024-06-24T13:12:34.175Z | raise exc
  | 2024-06-24T13:12:34.175Z | File "/var/lang/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 113, in handle_request
  | 2024-06-24T13:12:34.175Z | ) = self._receive_response_headers(**kwargs)
  | 2024-06-24T13:12:34.175Z | File "/var/lang/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 186, in _receive_response_headers
  | 2024-06-24T13:12:34.175Z | event = self._receive_event(timeout=timeout)
  | 2024-06-24T13:12:34.175Z | File "/var/lang/lib/python3.10/site-packages/httpcore/_sync/http11.py", line 224, in _receive_event
  | 2024-06-24T13:12:34.175Z | data = self._network_stream.read(
  | 2024-06-24T13:12:34.175Z | File "/var/lang/lib/python3.10/site-packages/httpcore/_backends/sync.py", line 124, in read
  | 2024-06-24T13:12:34.175Z | with map_exceptions(exc_map):
  | 2024-06-24T13:12:34.175Z | File "/var/lang/lib/python3.10/contextlib.py", line 153, in __exit__
  | 2024-06-24T13:12:34.175Z | self.gen.throw(typ, value, traceback)
  | 2024-06-24T13:12:34.175Z | File "/var/lang/lib/python3.10/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
  | 2024-06-24T13:12:34.175Z | raise to_exc(exc) from exc
  | 2024-06-24T13:12:34.175Z | httpcore.ReadTimeout: The read operation timed out
  | 2024-06-24T13:12:34.175Z | The above exception was the direct cause of the following exception:
  | 2024-06-24T13:12:34.175Z | Traceback (most recent call last):
 
dvonthenen commented 4 days ago

hi @osamabinsaleem

If you need to increase the timeout because the file is larger, you can follow the docs here: https://developers.deepgram.com/docs/python-sdk-pre-recorded-transcription#increasing-the-timeout-for-processing-larger-files

osamabinsaleem commented 4 days ago

I'm doing the same thing. Can I increase it to more than 300 as well? @dvonthenen

e.g like this for 10 minutes:

response = deepgram.listen.prerecorded.v("1").transcribe_file(
            payload, options, timeout=httpx.Timeout(600.0, connect=10.0)
        )
dvonthenen commented 4 days ago

of course! that's the point of that timeout field! we have a default (don't need to specify the timeout parameter at all), but if you need something other than default, you provide the timeout parameter plus whatever values you want.