deepgram / deepgram-python-sdk

Official Python SDK for Deepgram's automated speech recognition APIs.
https://developers.deepgram.com
MIT License
175 stars 47 forks source link

SpeechStarted gets triggered by background noise #403

Closed AurumnPegasus closed 1 month ago

AurumnPegasus commented 1 month ago

What is the current behavior?

When using vad_events flag in deepgram api call, SpeechStarted flag is often set to true due to background noise (especially when the transcription has just started, but sometimes in between as well). This does not reliably allow transcription of speech to text due to it being set off unpredictably.

Expected behavior

Expected behaviour is to have better sensitivity of SpeechStarted flag towards background noise, or you could add a feature to tune the sensitivity of vad events flag to manually set based on sensitivity required.

Please tell us about your environment

Python Version: 3.10.12 OS: Ubuntu 22.04, also tried on MacOS Making requests to: api.deepgram.com/v1/listen

Other information

dvonthenen commented 1 month ago

This has been a known problem since before the release of the feature and engineering is aware of this. Even the name of the event is misleading, but that is the name chosen via the API Spec. The event really should be called Noise Detection.

My recommendation would be to use a client-side VAD anyway since using this feature for that reason is probably not going to be favorable for your use case since this is server side. You incur a network round trip time just to tell you if someone has started to speak in the microphone which just doesn't make too much sense. Despite these recommendations, if you still want to implement server side... enable IntermResults to true and wait for the first word detected essentially gives you a similar result.

Closing this issue since this isn't an issue with the SDK, but rather the implementation of the API.