Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.8k stars 1.83k forks source link

en-US Andrew Multilingual Neural voice has changed and is much faster now #2557

Open amitoj-saini opened 3 weeks ago

amitoj-saini commented 3 weeks ago

IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:

Describe the bug

A clear and concise description of what the bug is. If things are not working as you expect, describe exactly what you are getting and why that is not what you expect. For example, speech recognition "does not work" may mean you got a cancellation event with a particular error message, or you did not get any recognition events, or the recognition result you got contains text that does not match what was spoken.

To Reproduce

Steps to reproduce the behavior:

  1. ...
  2. ...

Expected behavior

A clear and concise description of what you expected to happen.

Version of the Cognitive Services Speech SDK

Which version of the SDK are you using.

Platform, Operating System, and Programming Language

Additional context

boyko11 commented 3 weeks ago

Similar issues with "en-US-AndrewNeural"... For me certain phrases are faster than others. Examples: "Dublin is indeed the capital city of Ireland." - Okay speed "It's known for its rich history, vibrant culture, and famous landmarks like Trinity College and the Guinness Storehouse." - Okay Speed "Congratulations" - Okay speed "Your answer is correct" - Okay speed "Let's go to the next question." - Super Fast speed.

It was not behaving this way yesterday.

poetry show azure-cognitiveservices-speech

name : azure-cognitiveservices-speech
version : 1.37.0
description : Microsoft Cognitive Services Speech SDK for Python

python: 3.12.2

self.speech_config = speechsdk.SpeechConfig(
    subscription=<MY_SPEECH_KEY>, region=<MY_REGION>
)
self.speech_config.speech_synthesis_voice_name = "en-US-AndrewNeural"

self.speech_config.set_speech_synthesis_output_format(
  speechsdk.SpeechSynthesisOutputFormat.Riff48Khz16BitMonoPcm
)
synthesizer = speechsdk.SpeechSynthesizer(
          speech_config=self.speech_config, audio_config=None
      )

result = await asyncio.get_running_loop().run_in_executor(
    None,
    lambda _synthesizer=synthesizer, _text=text: _synthesizer.speak_text_async( 
        _text
    ).get(),
)
boyko11 commented 2 weeks ago

This appears to be issue with Speech Services and it is specific to the "en-US-AndrewNeural" and "en-US-AndrewMultilingualNeural". I can recreate the issue at the voicegallery on the azure portal https://speech.microsoft.com/portal/<some_id_which_i_am_not_sure_if_private/voicegallery. So this is not an issue with the python sdk, but rather the specific azure hosted voice model. I opened an Azure support ticket for this issue. It'd be nice if more folks open tickets. It's a darn shame, we really liked this voice.

lonely6ice commented 2 weeks ago

We have upgraded Andrew/AndrewMultilingual recently, and our tests show that the new model has greatly improved quality compared to the previous version. However, this feedback case is indeed a bad case, and we will fix this problem in future iterations.

boyko11 commented 2 weeks ago

We appreciate the hard work and effort in improving the model! It’s unfortunate that the latest changes made our users’ experience very different and unpleasant. We would appreciate a fix. It would also be very helpful if we can invoke specific versions of the model. This way we can just go back to a previous version model if a new version no longer works for the specific use case. Thanks!

boyko11 commented 2 weeks ago

Another interesting artifact is that struggles the pronounce "Goodbye" when it is together with my name and an exclamation mark - "Goodbye Boyko!" - It pronounces it as "Goodeh Boyko". When it is a dot instead( "Goodbye Boyko.") or no punctuation ("Goodbye Boyko"), then it works fine. It works fine with "Andy". I also tried with "Steve" and "Dean" - although it pronounces things correctly for both uses cases, when it ends with "!" sounds very different vs when it ends with "." - "Goodbye Steve!" sounds very different from "Goodbye Steve." and "Goodbye Dean!" sounds much faster than "Goodbye Dean."

heitechoy commented 2 weeks ago

The August update was terrible with a significant drop in voice quality. Some of my customers reported that the multilingual voices, especially the Andrew Multilingual voice, were changed so much that the reading quality was much worse than the old voices. Hopefully Azure can roll back the old version or fix it so that these voices can be used. I can honestly say that the update was terrible and unusable.

boyko11 commented 1 week ago

I just tried "en-US-AndrewNeural" and sounds REALLY good! @lonely6ice Thank you for the fix! It would be great if we do have the ability to refer to specific versions of the model though! Thanks again!

fatihyildizhan commented 3 days ago

Hello @lonely6ice en-US-AvaMultilingualNeural voice has changed.

It is still working with non-streaming sdk. image

When I try with websocket/v2 this is a different voice. wss://eastus.tts.speech.microsoft.com/cognitiveservices/websocket/v2

Thank you