en-US Andrew Multilingual Neural voice has changed and is much faster now

amitoj-saini commented 3 weeks ago

IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:

Speech SDK log taken from a run that exhibits the reported issue. See instructions on how to take logs.
A stripped down, simplified version of your source code that exhibits the issue. Or, preferably, try to reproduce the problem with one of the public samples in this repository (or a minimally modified version of it), and share the code.
If relevant, a WAV file of your input audio.
Additional information as shown below

Describe the bug

A clear and concise description of what the bug is. If things are not working as you expect, describe exactly what you are getting and why that is not what you expect. For example, speech recognition "does not work" may mean you got a cancellation event with a particular error message, or you did not get any recognition events, or the recognition result you got contains text that does not match what was spoken.

To Reproduce

Steps to reproduce the behavior:

...
...

Expected behavior

A clear and concise description of what you expected to happen.

Version of the Cognitive Services Speech SDK

Which version of the SDK are you using.

Platform, Operating System, and Programming Language

OS: [e.g. Windows, Linux, Android, iOS, ...] - please be specific
Hardware - x64, x86, ARM, ...
Programming language: C#, C++, Java, JavaScript, Objective-C, Python
Browser [e.g. Chrome, Safari] (if applicable) - please be specific

Additional context

Error messages, stack trace, ...
Any additional information.

boyko11 commented 3 weeks ago

Similar issues with "en-US-AndrewNeural"... For me certain phrases are faster than others. Examples: "Dublin is indeed the capital city of Ireland." - Okay speed "It's known for its rich history, vibrant culture, and famous landmarks like Trinity College and the Guinness Storehouse." - Okay Speed "Congratulations" - Okay speed "Your answer is correct" - Okay speed "Let's go to the next question." - Super Fast speed.

It was not behaving this way yesterday.

poetry show azure-cognitiveservices-speech

name : azure-cognitiveservices-speech
version : 1.37.0
description : Microsoft Cognitive Services Speech SDK for Python

python: 3.12.2

self.speech_config = speechsdk.SpeechConfig(
    subscription=<MY_SPEECH_KEY>, region=<MY_REGION>
)
self.speech_config.speech_synthesis_voice_name = "en-US-AndrewNeural"

self.speech_config.set_speech_synthesis_output_format(
  speechsdk.SpeechSynthesisOutputFormat.Riff48Khz16BitMonoPcm
)
synthesizer = speechsdk.SpeechSynthesizer(
          speech_config=self.speech_config, audio_config=None
      )

result = await asyncio.get_running_loop().run_in_executor(
    None,
    lambda _synthesizer=synthesizer, _text=text: _synthesizer.speak_text_async( 
        _text
    ).get(),
)

boyko11 commented 2 weeks ago

This appears to be issue with Speech Services and it is specific to the "en-US-AndrewNeural" and "en-US-AndrewMultilingualNeural". I can recreate the issue at the voicegallery on the azure portal https://speech.microsoft.com/portal/<some_id_which_i_am_not_sure_if_private/voicegallery. So this is not an issue with the python sdk, but rather the specific azure hosted voice model. I opened an Azure support ticket for this issue. It'd be nice if more folks open tickets. It's a darn shame, we really liked this voice.

lonely6ice commented 2 weeks ago

We have upgraded Andrew/AndrewMultilingual recently, and our tests show that the new model has greatly improved quality compared to the previous version. However, this feedback case is indeed a bad case, and we will fix this problem in future iterations.

boyko11 commented 2 weeks ago

We appreciate the hard work and effort in improving the model! It’s unfortunate that the latest changes made our users’ experience very different and unpleasant. We would appreciate a fix. It would also be very helpful if we can invoke specific versions of the model. This way we can just go back to a previous version model if a new version no longer works for the specific use case. Thanks!

boyko11 commented 2 weeks ago

Another interesting artifact is that struggles the pronounce "Goodbye" when it is together with my name and an exclamation mark - "Goodbye Boyko!" - It pronounces it as "Goodeh Boyko". When it is a dot instead( "Goodbye Boyko.") or no punctuation ("Goodbye Boyko"), then it works fine. It works fine with "Andy". I also tried with "Steve" and "Dean" - although it pronounces things correctly for both uses cases, when it ends with "!" sounds very different vs when it ends with "." - "Goodbye Steve!" sounds very different from "Goodbye Steve." and "Goodbye Dean!" sounds much faster than "Goodbye Dean."

heitechoy commented 2 weeks ago

The August update was terrible with a significant drop in voice quality. Some of my customers reported that the multilingual voices, especially the Andrew Multilingual voice, were changed so much that the reading quality was much worse than the old voices. Hopefully Azure can roll back the old version or fix it so that these voices can be used. I can honestly say that the update was terrible and unusable.

boyko11 commented 1 week ago

I just tried "en-US-AndrewNeural" and sounds REALLY good! @lonely6ice Thank you for the fix! It would be great if we do have the ability to refer to specific versions of the model though! Thanks again!

fatihyildizhan commented 3 days ago

Hello @lonely6ice en-US-AvaMultilingualNeural voice has changed.

It is still working with non-streaming sdk.

When I try with websocket/v2 this is a different voice. wss://eastus.tts.speech.microsoft.com/cognitiveservices/websocket/v2

Thank you

Azure-Samples / cognitive-services-speech-sdk

en-US Andrew Multilingual Neural voice has changed and is much faster now #2557