Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.94k stars 1.86k forks source link

Background Music on Browser Interferes with STT #2536

Closed jillbourque closed 2 months ago

jillbourque commented 3 months ago

Hi there,

We are using this Speech SDK: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/quickstart/javascript/browser/from-microphone

In our app we have background music that plays in the browser while the person is talking. If there is no music playing, then the STT result is almost perfect but if there is music playing in the browser then the results are incorrect or none are created at all.

Examples: Without background music: STT Text: Testing 1-2 Three Check, check, check. - CORRECT

With background music: STT: Text: It has been 123. - INCORRECT or nothing captured.

In the sample code it shows this: var audioConfig = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();

So I was assuming that it was capturing only from the default microphone but somehow the browser audio seems to be sourced as well.

How can we isolate the audio just from my microphone?

pankopon commented 2 months ago

Hi, if the background music is playing from the loudspeaker(s) on the device where the browser is running, then the microphone can capture it together with speech. Even with a headset there can be some playback audio leakage depending on the playback volume and headset quality.

This means the input to speech recognition is a mix of music and speech, therefore the quality of recognition results is not as good as with speech-only input. You can check what the input sounds like in a similar scenario with a person speaking while there is simultaneous music playback, make a recording to a file and give it a listen.

There is no support for echo cancellation (= removal of the playback audio from the recording) in the Speech SDK for JavaScript, so your options would be: