deepgram / deepgram-python-sdk

Official Python SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
239 stars 63 forks source link

Async transcription issue #446

Closed swimleftproducts closed 3 months ago

swimleftproducts commented 3 months ago

What is the current behavior?

I am trying to do an async transcription.

I get an error that says Task failed with exception: Attempted to send an sync request with an AsyncClient instance.

Steps to reproduce

I have the following code in my app.

deepgram = DeepgramClient(settings.deepgram.api_key)

def async a_function():
    result = await deepgram.listen.asyncrest.v("1").transcribe_file(payload, options)
    return result

Expected behavior

I would expect this to return the the result object.

Please tell us about your environment

python 3.10, Macbook Pro.

Other information

The docs seem to have a dead link: https://github.com/dvonthenen/deepgram-python-sdk/blob/main/deepgram/clients/live/v1/async_client.py which I was sent to from here: https://developers.deepgram.com/docs/threaded-and-async-io-task-support

davidvonthenen commented 3 months ago

Hi @swimleftproducts

Thanks for pointing out the broken link(s). Those have been fixed.

As for the issue you posted, from the little code snippet you posted, you have a small bug which should be: def async a_function(): -> async def a_function():

I don't know if that was just a copy-and-paste error, but if you could post some code for me to look at, that would be really helpful in tracking down your problem.

swimleftproducts commented 3 months ago

Yeah, that is a typing error. I was just typing out that the deepgram call was in an async function

The actual code is:

async def GetTranscript(audio: AudioSegment) -> str:
    buffer = io.BytesIO()
    buffer.name = "a_file_name.wav"
    audio_file = audio.export(buffer, format="wav")
    payload = {"buffer": audio_file}
    result = await deepgram.listen.asyncrest.v("1").transcribe_file(payload, options)
    return result.results.channels[0].alternatives[0].transcript
davidvonthenen commented 3 months ago

Did you have more of the code like where the transcription options are set? If the AudioSegment from the pydub library?

This is working example of using async to compare:

# Copyright 2023-2024 Deepgram SDK contributors. All Rights Reserved.
# Use of this source code is governed by a MIT license that can be found in the LICENSE file.
# SPDX-License-Identifier: MIT

import asyncio
import aiofiles
from dotenv import load_dotenv
import logging
from deepgram.utils import verboselogs
from datetime import datetime
import httpx

from deepgram import (
    DeepgramClient,
    DeepgramClientOptions,
    PrerecordedOptions,
    FileSource,
)

load_dotenv()

AUDIO_FILE = "preamble.wav"

async def main():
    try:
        # STEP 1 Create a Deepgram client using the API key in the environment variables
        config: DeepgramClientOptions = DeepgramClientOptions(
            verbose=verboselogs.SPAM,
        )
        deepgram: DeepgramClient = DeepgramClient("", config)
        # OR use defaults
        # deepgram: DeepgramClient = DeepgramClient()

        # STEP 2 Call the transcribe_file method on the rest class
        async with aiofiles.open(AUDIO_FILE, "rb") as file:
            buffer_data = await file.read()

        payload: FileSource = {
            "buffer": buffer_data,
        }

        options: PrerecordedOptions = PrerecordedOptions(
            model="nova-2",
            smart_format=True,
            utterances=True,
            punctuate=True,
            diarize=True,
        )

        before = datetime.now()
        response = await deepgram.listen.asyncrest.v("1").transcribe_file(
            payload, options, timeout=httpx.Timeout(300.0, connect=10.0)
        )
        after = datetime.now()

        print(response.to_json(indent=4))
        print("")
        difference = after - before
        print(f"time: {difference.seconds}")

    except Exception as e:
        print(f"Exception: {e}")

if __name__ == "__main__":
    asyncio.run(main())
swimleftproducts commented 3 months ago

Ah, my bad for trying to assume what is important. The extend of my deepgram code is below. Yes, the audio segment is a pydub audio segment.

from deepgram import DeepgramClient, FileSource, PrerecordedOptions

deepgram = DeepgramClient(settings.deepgram.api_key)
options = PrerecordedOptions(model="nova-2", smart_format=True)

async def GetTranscript(audio: AudioSegment) -> str:
    buffer = io.BytesIO()
    buffer.name = "a_file_name.wav"
    audio_file = audio.export(buffer, format="wav")
    payload = {"buffer": audio_file}
    result = await deepgram.listen.asyncrest.v("1").transcribe_file(payload, options)
    return result.results.channels[0].alternatives[0].transcript
swimleftproducts commented 3 months ago

I can run your example in a notebook. So the issue is with me. 😢 . I will keep debugging. Thanks for the help.

davidvonthenen commented 3 months ago

@swimleftproducts If you want to chat in Discord to help debug, drop me a line. I will need to see more code, but you can DM for some privacy in case that's the issue.

swimleftproducts commented 3 months ago

As a follow up for anyone that ends up here.

The core of the issue is how the buffer used in the payload is created. Originally I was just sending the result of using the pydub AudioSegment.export. with a BytesIO object. This worked as the payload for the OpenAI sdk, but it was not working here.

Instead in the the following working code I correctly create and pass a bytes object to the deepgram call.

 buffer = io.BytesIO()
 buffer.name = "a_file_name.wav"
 audio.export(buffer, format="wav")
 buffer.seek(0)  # Ensure buffer is at the beginning
 payload = {"buffer": buffer.read()}
 ... use payload as normal
davidvonthenen commented 3 months ago

glad you figured it out!