Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.68k stars 1.79k forks source link

AndrewMultilingualNeural en-US says the wrong word when only one word to speech. #2300

Open yangman92 opened 3 months ago

yangman92 commented 3 months ago

for example: The word 'abandon', the voice like you cut off the letter a. He just said 'bandon'. The problem occurs when you use AndrewMultilingualNeural, en-US, and only one word. This just happened by this time. I'm sure that It is correct before.

Do you guys have the same problem? I am waiting for your answer. Thanks.

I tried speech studio and java SDK. speech studio version: 1.1 java SDK below.

       <dependency>
            <groupId>com.microsoft.cognitiveservices.speech</groupId>
            <artifactId>client-sdk</artifactId>
            <version>1.35.0</version>
        </dependency>
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'>
        <voice name='en-US-AndrewMultilingualNeural'>
                <lang xml:lang='en-US'>
                        <prosody rate='-22.0%'>
                                 abandon
                        </prosody>
                </lang>
        </voice>
</speak>
yangman92 commented 3 months ago

Does someone know hot to fix this issue? Please.

yangman92 commented 3 months ago

I've made a video about the issue. https://tingxie100-1317670271.cos.ap-shanghai.myqcloud.com/test/abandon%E9%94%99%E8%AF%AF%E5%8F%91%E9%9F%B3%E8%A7%86%E9%A2%91%E5%BD%95%E5%88%B6.mp4

BrianMouncer commented 3 months ago

@yulin-li can you please take a look at this?

BrianMouncer commented 3 months ago

@yangman92

I will wait to se what our TTS team has to say about this issue, but in the meantime. But I see you are already using SSML in your example, so I will suggest this as a possible workaround to the bad pronunciation of short utterances. You can use the SSML Pronunciation tag to explicitly tell the TTS voice how you would like the word to be pronounced.

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-pronunciation

For example:

abandon
yulin-li commented 3 months ago

@Kerry-LinZhang could you help to triage?

Kerry-LinZhang commented 3 months ago

Hi @yangman92 Thanks and well received for the feedback, let me track with owner about it and keep you updated for the progress.

I can repro the problem as well

Kerry-LinZhang commented 3 months ago

Hi @yangman92 Here is the workaround to quick fix this problem, you can add a dot after the word, like: abandon.

We will investigate the problem at the same time.

yangman92 commented 3 months ago

Hi @Kerry-LinZhang Tanks for your reply. The method add a dot after the word doesn't work with other voiceId. EmmaMultilingualNeural . like some word (activity. accord. accordance. act. activity. affair. apparent. ) That is what my problem here also. https://learn.microsoft.com/en-us/answers/questions/1621198/andrewmultilingualneural-en-us-says-the-wrong-word I'm waiting for your answer.

Kerry-LinZhang commented 2 months ago

Hi @yangman92 well noted, we are continue investigating on this, I will keep you updated for the progress.

Kerry-LinZhang commented 2 months ago

Hi @yangman92 we have found solution to deal with the problem, and the release for the fix is ongoing, would you mind telling us the voices and regions you are using, and once it has been released on it in the following week, I will notify you here for a try.

yangman92 commented 2 months ago

Hi @Kerry-LinZhang What regions I've been using are 'east asia', 'japan east', 'korea central'.

What voices I've been using are 'AndrewMultilingualNeural' (It worked. I tried many words. It did good job.) , 'EmmaMultilingualNeural' (did not work with some word. activity. accord. accordance. act. activity. affair. apparent.).

I tested many voices and some of them didn't work (one word speech. add a dot after the word) are 'EmmaMultilingualNeural', 'Emma', 'BrianMultilingualNeural', 'Brian'.

Kerry-LinZhang commented 2 months ago

Well received, thanks @yangman92 , let me keep you posted for the releasing progress to fully address the issue.

Kerry-LinZhang commented 2 months ago

Hi @yangman92 The problem has been fixed on AndrewMultilingualNeural now, please have a try. And our newly trained model can fix the problem with EmmaMultilingualNeural, but it needs longer release time, I will continue tracking it, if there is any concern, please feel free to let me know.

Kerry-LinZhang commented 2 months ago

Hi @yangman92 we will fix the 'a' missing problem with the new model of EmmaMultilingualNeural, the targeting ETA is no late than next month (5/31).

yangman92 commented 2 months ago

@Kerry-LinZhang Thanks for your support. 😄

github-actions[bot] commented 1 month ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

Kerry-LinZhang commented 1 month ago

Hi @yangman92 we found the new model has regression in certain domains, and another new model is under evaluation which will be finished before June 10th. If the quality is fine, we will release it by July, would you mind waiting for it, thanks again for your understanding!

yangman92 commented 2 weeks ago

@Kerry-LinZhang Thanks, Please let me know If the new version is released.