Discrepancy between tts-text-stream sample output and Speech Studio Preview

dmingke commented 2 months ago

I'm trying to use the voice 'en-US-AnaNeural' in the tts-text-stream sample but made a small modification by adding the PushAudioOutputStreamSampleCallback to output the audio chunk:

speech_config = speechsdk.SpeechConfig(endpoint=f"wss://{server_config.speech_region}.tts.speech.microsoft.com/cognitiveservices/websocket/v2",subscription=server_config.speech_key)
speech_config.speech_synthesis_voice_name = "en-US-AnaNeural"
stream_callback = PushAudioOutputStreamSampleCallback()
push_stream = speechsdk.audio.PushAudioOutputStream(stream_callback)
audio_config = speechsdk.audio.AudioConfig(stream=push_stream)
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

en-US-AnaNeural is a child voice that I found in Speech Studio, but the output is a male voice instead. I tried other voice models like en-GB-MaisieNeural, and most work, but a few, like en-US-Ana, -Ashley, and -Amber, always result in the same male voice. The SSML version works without issue. There are no errors during runtime. How could this happen, and how can I resolve the inconsistency?

To Reproduce

Steps to reproduce the behavior:

Set the speech_synthesis_voice_name to en-US-AnaNeural.
Output the audio chunk using PushAudioOutputStreamSampleCallback (I haven't tested with the default speaker output, so not sure whether it is relevant).
Keep all other settings consistent with the tts-text-stream sample.

Expected behavior

The output voice should match the voice from Speech Studio and should not be a male voice.

Version of the Cognitive Services Speech SDK

azure-cognitiveservices-speech==1.40.0 programming language: python

github-actions[bot] commented 1 month ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

dmingke commented 1 month ago

can anyone have a look at this problem? This is quite urgent. If there is any information I need to provide more, I will reply promptly.

github-actions[bot] commented 2 weeks ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

Azure-Samples / cognitive-services-speech-sdk

Discrepancy between tts-text-stream sample output and Speech Studio Preview #2604