Background Music on Browser Interferes with STT

Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK

MIT License

2.94k stars 1.86k forks source link

Hi there,

We are using this Speech SDK: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/quickstart/javascript/browser/from-microphone

In our app we have background music that plays in the browser while the person is talking. If there is no music playing, then the STT result is almost perfect but if there is music playing in the browser then the results are incorrect or none are created at all.

Examples: Without background music: STT Text: Testing 1-2 Three Check, check, check. - CORRECT

With background music: STT: Text: It has been 123. - INCORRECT or nothing captured.

In the sample code it shows this: var audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();

So I was assuming that it was capturing only from the default microphone but somehow the browser audio seems to be sourced as well.

How can we isolate the audio just from my microphone?

Hi, if the background music is playing from the loudspeaker(s) on the device where the browser is running, then the microphone can capture it together with speech. Even with a headset there can be some playback audio leakage depending on the playback volume and headset quality.

This means the input to speech recognition is a mix of music and speech, therefore the quality of recognition results is not as good as with speech-only input. You can check what the input sounds like in a similar scenario with a person speaking while there is simultaneous music playback, make a recording to a file and give it a listen.

There is no support for echo cancellation (= removal of the playback audio from the recording) in the Speech SDK for JavaScript, so your options would be:

Check if the OS system/audio settings have a setting to enable automatic echo cancellation in case of simultaneous playback and recording (which may or may not apply when using a browser). This of course won't help if your app may be used on devices that you cannot control.
Play the music at a volume low enough so that it doesn't interfere with speech recognition too much.
Do not play anything during recognition.

Azure-Samples / cognitive-services-speech-sdk

Background Music on Browser Interferes with STT #2536