SpeechStarted gets triggered by background noise

deepgram / deepgram-python-sdk

Official Python SDK for Deepgram's automated speech recognition APIs.

MIT License

175 stars 47 forks source link

What is the current behavior?

When using vad_events flag in deepgram api call, SpeechStarted flag is often set to true due to background noise (especially when the transcription has just started, but sometimes in between as well). This does not reliably allow transcription of speech to text due to it being set off unpredictably.

This has been a known problem since before the release of the feature and engineering is aware of this. Even the name of the event is misleading, but that is the name chosen via the API Spec. The event really should be called Noise Detection.

My recommendation would be to use a client-side VAD anyway since using this feature for that reason is probably not going to be favorable for your use case since this is server side. You incur a network round trip time just to tell you if someone has started to speak in the microphone which just doesn't make too much sense. Despite these recommendations, if you still want to implement server side... enable IntermResults to true and wait for the first word detected essentially gives you a similar result.

Closing this issue since this isn't an issue with the SDK, but rather the implementation of the API.

deepgram / deepgram-python-sdk

SpeechStarted gets triggered by background noise #403

What is the current behavior?

Expected behavior

Please tell us about your environment

Other information