deepgram / deepgram-python-sdk

Official Python SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
175 stars 47 forks source link

UtteranceEnd never triggers #385

Closed Krupskis closed 1 month ago

Krupskis commented 1 month ago

What is the current behavior?

The UtteranceEnd event does not come 1000ms after the last spoken word.

Steps to reproduce

options: LiveOptions = LiveOptions(
            model="nova-2",
            language="en",
            # Apply smart formatting to the output
            smart_format=True,
            # Raw audio format details
            encoding="mulaw",
            channels=1,
            sample_rate=16000,
            # To get UtteranceEnd, the following must be set:
            interim_results=True,
            utterance_end_ms="1000",
            vad_events=True,
            # Time in milliseconds of silence to wait for before finalizing speech
            endpointing=300
        )

@sock.route('/echo')
def echo(ws):
    try:
        # STEP 1: Create a Deepgram client using the API key
        config = DeepgramClientOptions(
            options={"keepalive": "true"} # Comment this out to see the effect of not using keepalive
        )
        deepgram = DeepgramClient("", config)

        # STEP 2: Create a websocket connection to Deepgram
        dg_connection = deepgram.listen.live.v("1")

        # STEP 3: Define the event handlers for the connection
        def on_message(self, result, **kwargs):
            global is_finals
            print(result.type)

            if result.is_final:
                sentence = result.channel.alternatives[0].transcript
                is_finals.append(sentence)
                if result.speech_final:
                    utterance = ' '.join(is_finals)
                    print(f"Speech final: {utterance}")
                    is_finals = []

        def on_metadata(self, metadata, **kwargs):
            print(f"\n\n{metadata}\n\n")

        def on_error(self, error, **kwargs):
            print(f"\n\n{error}\n\n")

        dg_connection.on(LiveTranscriptionEvents.Transcript, on_message)
        dg_connection.on(LiveTranscriptionEvents.Metadata, on_metadata)
        dg_connection.on(LiveTranscriptionEvents.Error, on_error)

        dg_connection.start(options)
        while True:
            data = ws.receive()
            if data:
                dg_connection.send(base64.b64decode(data))
            # ws.send(data)

    except Exception as e:
        print(f"Error: {e}")
        # dg_connection.stop()
        # ws.close()

play music in the background and speak, print(resut.type) will only print Results, but not the UtteranceEnd event after I finish speaking, I have to stop music for speech_final to be triggered.

Expected behavior

I would expect UtteranceEnd to trigger a second after my last word so I can finalize the sentence.

Please tell us about your environment

Local Flask server on a mac m2

Krupskis commented 1 month ago

Also tried utterance_end_ms=1000, same issue

dvonthenen commented 1 month ago

You need to create a function for the event to call:

        async def on_utterance_end(self, utterance_end, **kwargs):
            print(f"Deepgram Utterance End")

then you need to actually hook the function up to the event:

dg_connection.on(LiveTranscriptionEvents.UtteranceEnd, on_utterance_end)

Both are missing in your code sample above.