deepgram / deepgram-js-sdk

Official JavaScript SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
145 stars 54 forks source link

UtteranceEnd does not seem to be being called #241

Closed adambeer closed 7 months ago

adambeer commented 8 months ago

What is the current behavior?

The client is configured to ask for UtteranceEnd events but isnt receiving any.

Steps to reproduce

Use the latest javascript client (3.1.3) with the following config:

{ 
            model: "nova-2-general", 
            version:'latest',
            smart_format: false,
            interim_results: true,
            encoding: 'linear16',
            channels: 1,
            sample_rate: 16000,
            utterance_end_ms: 2000,
            vad_events: true,
            endpointing: true
}

Call listen.live method and no 'type' of "UtteranceEnd" is returned even when there is more than enough silence after speech.

Expected behavior

A "UtteranceEnd" type event after 2 seconds of silence.

Please tell us about your environment

We want to make sure the problem isn't specific to your operating system or programming language.

Other information

I dont think that this is a configuration issue as I have followed the docs on this topic from here: https://developers.deepgram.com/docs/understanding-end-of-speech-detection

jkroll-deepgram commented 8 months ago

Hi @adambeer, I understand you've been in touch through Discord too, and we've made some progress there. It doesn't appear to be an SDK or API issue on our side.

Confirming that you're using interim_results: true along with an utterance_end_ms value, which does enable UtteranceEnd messages. That all looks correct to me.

One additional note is that you're setting utterance_end_ms: 2000. That's 2000 ms, i.e. 2 full seconds before an utterance end is triggered. Our default is 1000 ms, so 2000 ms is on the higher side. However I see that's what you're intending and should not be an issue as long as no background speech, etc, is getting in the way of that 2 seconds of silence.

If you're still not receiving the UtteranceEnd messages, can you please share request ID(s) and we can troubleshoot further to see why your requests aren't working as expected?

adambeer commented 8 months ago

@jkroll-deepgram That must be another Adam, this is the only place Ive brought this issue forward.

Here is a request ID I just did: 7a23ede7-c462-4c54-9ed8-d0de89cb8383

Edit: I let it continue streaming for at least 5 seconds after I stopped speaking. Im in a completely quiet environment.

adambeer commented 8 months ago

Have we uncovered anything with the provided request id?

jkroll-deepgram commented 8 months ago

Hi @adambeer, apologies on my confusion about being related to a Discord issue.

I just tried out our live-streaming example with your API parameters. See this script: https://github.com/deepgram/deepgram-js-sdk/blob/main/examples/node-live/index.js

And the top-level readme with a few instructions on how to get started with the examples: https://github.com/deepgram/deepgram-js-sdk/tree/main

These were the parameters I ran:

  const connection = deepgram.listen.live({ 
            model: "nova-2-general", 
            version:'latest',
            smart_format: false,
            interim_results: true,
            // encoding: 'linear16',
            channels: 1,
            // sample_rate: 16000,
            utterance_end_ms: 1000,
            vad_events: true,
            endpointing: true
  });

I commented out the encoding and sample rate, as our BBC radio example has different parameters. I also lowered the utterance_end_ms to 1000 ms, because a 2000 ms pause generally does not occur in radio.

Here's a sample of the transcript I'm getting:

{ type: 'SpeechStarted', channel: [ 0, 1 ], timestamp: 148.74 }
{
  alternatives: [
    {
      transcript: 'there is significant repression',
      confidence: 0.9902344,
      words: [Array]
    }
  ]
}
{ alternatives: [ { transcript: '', confidence: 0, words: [] } ] }
{ type: 'UtteranceEnd', channel: [ 0, 1 ], last_word_end: 147.98 }

Can you try out another audio source such as our BBC radio link, to confirm that it's not something about your audio that is causing the issue?

adambeer commented 8 months ago

I lowered the utterance_end_ms to 1000 and am still getting no events. I dont even get the 'SpeechStarted' event. However the transcriptions are working just fine, so it doesnt really make sense that it could be an audio issue.

Heres a new request ID with this latest 1000ms test: 24ed5a13-d101-4c82-995e-220df4c27fdb

Are either of these request IDs helpful?

dvonthenen commented 7 months ago

Just an FYI, you can't use a value lower than 1000 according to the docs.

jkroll-deepgram commented 7 months ago

Hi @adambeer, I appreciate your patience here, and I want to get this resolved for you.

I looked at your latest request ID (24ed5a13-d101-4c82-995e-220df4c27fdb), and was able to re-run that request with your same parameters. I did get the SpeechStarted event at the beginning, the results for about three seconds of speech, and then the UtteranceEnd event.

Here is a truncated version of the output I see (different request ID as I am replicating a new request using your original setup):

{"type":"Results","channel_index":[0,1],"duration":1.0,"start":0.0,"is_final":false,"speech_final":false,"channel":{"alternatives":[{"transcript":"","confidence":0.0,"words":[]}]},"metadata":{"request_id":"bf615900-9345-4653-b81d-22b1cb618aaf","model_info":{"name":"2-general-nova","version":"2024-01-11.36317","arch":"nova-2"},"model_uuid":"1dbdfb4d-85b2-4659-9831-16b3c76229aa"}}
{"type":"SpeechStarted","channel":[0,1],"timestamp":1.73}
{"type":"Results","channel_index":[0,1],"duration":2.0,"start":0.0,"is_final":false,"speech_final":false,"channel":{"alternatives":[{"transcript":<TRANSCRIPT>}]},"metadata":{"request_id":"bf615900-9345-4653-b81d-22b1cb618aaf","model_info":{"name":"2-general-nova","version":"2024-01-11.36317","arch":"nova-2"},"model_uuid":"1dbdfb4d-85b2-4659-9831-16b3c76229aa"}}
{"type":"Results","channel_index":[0,1],"duration":3.0,"start":0.0,"is_final":false,"speech_final":false,"channel":{"alternatives":[{"transcript":<TRANSCRIPT>}]},"metadata":{"request_id":"bf615900-9345-4653-b81d-22b1cb618aaf","model_info":{"name":"2-general-nova","version":"2024-01-11.36317","arch":"nova-2"},"model_uuid":"1dbdfb4d-85b2-4659-9831-16b3c76229aa"}}
{"type":"Results","channel_index":[0,1],"duration":3.93,"start":0.0,"is_final":true,"speech_final":true,"channel":{"alternatives":[{"transcript":<TRANSCRIPT>}]},"metadata":{"request_id":"bf615900-9345-4653-b81d-22b1cb618aaf","model_info":{"name":"2-general-nova","version":"2024-01-11.36317","arch":"nova-2"},"model_uuid":"1dbdfb4d-85b2-4659-9831-16b3c76229aa"}}
{"type":"Results","channel_index":[0,1],"duration":1.0699999,"start":3.93,"is_final":false,"speech_final":false,"channel":{"alternatives":[{"transcript":"","confidence":0.0,"words":[]}]},"metadata":{"request_id":"bf615900-9345-4653-b81d-22b1cb618aaf","model_info":{"name":"2-general-nova","version":"2024-01-11.36317","arch":"nova-2"},"model_uuid":"1dbdfb4d-85b2-4659-9831-16b3c76229aa"}}
{"type":"UtteranceEnd","channel":[0,1],"last_word_end":3.46}

I also was able to run the prior example with our JavaScript SDK starter example, showing it's not the SDK either.

Is there any chance that you are filtering out which events you are printing/processing? The response events have different types, such as "type": "Results", "type": "SpeechStarted", "type": "UtteranceEnd". Could you be only examining the Results events and omitting the others? Are you able to provide your full reproducible code snippet?

adambeer commented 7 months ago

@jkroll-deepgram Thanks for the response. I just realized that there are new events I need to listen to that wernt in the previous version of the library. I had assumed that they would be coming through the Transcribe one. Its working now. Thanks for the help!

jkroll-deepgram commented 7 months ago

Glad to hear it! I hope that receiving these new events enhances your product :)

Sheldenshi commented 5 months ago

@jkroll-deepgram Thanks for the response. I just realized that there are new events I need to listen to that wernt in the previous version of the library. I had assumed that they would be coming through the Transcribe one. Its working now. Thanks for the help!

Hi, I am running into this. "UtteranceEnd" wont show after i migrated to V3. What was the event you are referring to

jkroll-deepgram commented 5 months ago

Hi @Sheldenshi, here's an example of adding a listener for the UtteranceEnd event: https://github.com/deepgram/deepgram-js-sdk/blob/main/examples/node-live/index.js#L32

juandld commented 5 months ago

@jkroll-deepgram, Just as a note, I was pulling my hair bc I needed to get something done, the only reason why I found how to use utterance end was BC a search AI found it for me, I was looking inside the documentation myself, read the whole utterance end page, but there was no mention of how to listen for that event, less on JS, I was only able to find it after I already knew the answer, made a follow-up request on this issue and then erased it when I found it after, but I gotta say it was a frustrating experience, if you want to add it up to a skill issue that's fine, I´m just trying to be helpful here.

Sheldenshi commented 5 months ago

@jkroll-deepgram, Just as a note, I was pulling my hair bc I needed to get something done, the only reason why I found how to use utterance end was BC a search AI found it for me, I was looking inside the documentation myself, read the whole utterance end page, but there was no mention of how to listen for that event, less on JS, I was only able to find it after I already knew the answer, made a follow-up request on this issue and then erased it when I found it after, but I gotta say it was a frustrating experience, if you want to add it up to a skill issue that's fine, I´m just trying to be helpful here.

yeah I agree. I think utteranceEnd should be included in this documentation page https://developers.deepgram.com/docs/node-sdk-streaming-transcription

jkroll-deepgram commented 5 months ago

Thanks @juandld and @Sheldenshi for the feedback. I've passed it onto our team who manages our docs and SDKs. We are working on consolidating our SDK documentation directly in the open-source GitHub repos. When we create website doc pages for our SDKs, they are meant to be a starting point rather than have comprehensive coverage, but that can be confusing as customers don't know what additional features are omitted in the docs.