Azure-Samples / Cognitive-Speech-TTS

Microsoft Text-to-Speech API sample code in several languages, part of Cognitive Services.
https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/
Other
907 stars 513 forks source link

ASR incorrectly received the TTS voice and repeating ... #379

Open phoenixdna opened 1 month ago

phoenixdna commented 1 month ago

I found quite frustrating while I trying to use the TTS combine with azure 's ASR . For some reason, The TTS output was received by ASR incorrectly even if I mute the microphone with Pyaudio. So please someone help

The Code pieces

def text_to_speech(text):

    speech_config2 = speechsdk.SpeechConfig(subscription='xxx', region='eastasia')
    audio_config2 = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)

    # The neural multilingual voice can speak different languages based on the input text.
    speech_config2.speech_synthesis_voice_name='zh-CN-XiaoyiNeural'
    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config2, audio_config=audio_config2)

    # Get text from the console and synthesize to the default speaker.

    global asr_active
    asr_active = False
    mute_microphone()
    #speech_recognizer.stop_continuous_recognition()
    time.sleep(1)
    print("《《《《《《《《《《《tts=>>>",text)
    speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
    print("tts=>》》》》》》》》》》》》》》》》》>>")

    time.sleep(1)
    asr_active = True
    unmute_microphone()

    #speech_recognizer.start_continuous_recognition()

System output

final response recived model response: 很抱歉,我无法提供实时的日期信息。请您查看您的电子设备或询问您的语音助手来获取今天的日期。 microphone muted 《《《《《《《《《《《tts=>>> 很抱歉,我无法提供实时的日期信息。请您查看您的电子设备或询问您的语音助手来获取今天的日期。 tts=>》》》》》》》》》》》》》》》》》>> Speech synthesized for text [很抱歉,我无法提供实时的日期信息。请您查看您的电子设备或询问您的语音助手来获取今天的日期。] microphone UNmuted RECOGNIZED: SpeechRecognitionEventArgs(session_id=cf14ed11f79a444faa988fab687687c0, result=SpeechRecognitionResult(result_id=96877e5119b74d2e9f7b394072741b87, text="很抱歉,我无法生气。", reason=ResultReason.RecognizedSpeech)) sent to model reached

Question:

As you can seen in the code, even I muted the microphone during speech_synthesizer.speak_text_async(text).get(), I still can see some of the words has been incorrectly received in speech_recognizer,

RECOGNIZED: SpeechRecognitionEventArgs(session_id=cf14ed11f79a444faa988fab687687c0, result=SpeechRecognitionResult(result_id=96877e5119b74d2e9f7b394072741b87, text="很抱歉,我无法生气。",

I don't know why ASR can recognize the TTS voice although I have muted microphone, actrually I also add the active flag in callback function of SpeechRecognizer.recognized, but no help..., so please help ,Thx in advance.