Unintended Pauses Between Words When Using text_stream_sample for Chinese TTS

dmingke commented 1 month ago

Bug Description

This issue occurs when using the text_stream_sample with the zh-CN-YunxiaNeural voice model, resulting in unintended pauses between words, which disrupts the natural flow of the speech. For instance, when synthesizing the sentence "今天的天气真好", there is a noticeable and unnatural pause between the words "今" and "天", affecting the overall quality of the output. This behavior might be related to how the OpenAI-generated text is processed in chunks, leading to these pauses during real-time synthesis. I am looking for ways to prevent this from happening.

Steps to Reproduce

Use the framework of the text_stream_sample in the repo.
Set the speech_synthesis_voice_name to zh-CN-YunxiaNeural.
Send the synthesized audio chunks (using audio_buffer.tobytes()) through the WebSocket and play the PCM audio data on the client-side.
Notice that the longer the response, the more frequently unnatural pauses between words occur.

Expected Behavior

The synthesized speech should flow smoothly, with no unintended pauses between words unless indicated by appropriate punctuation. Each sentence should be delivered naturally and continuously.

Version of the Cognitive Services Speech SDK

SDK Version: azure-cognitiveservices-speech==1.40.0
Programming Language: Python 3.x

Additional Context

No SSML was used, only plain text input.
The issue is most prominent with Chinese voice models. I have tested both zh-CN-YunxiaNeural and zh-CN-YunxiNeural, and the issue seems consistent across them.
The issue does not seem to occur with English voice models, such as en-US-BrianMultilingualNeural.

yulin-li commented 1 month ago

Thanks for reporting this issue.

@niuzheng168 could you check?

github-actions[bot] commented 1 month ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

dmingke commented 1 month ago

hi guys, can anyone have a look at this problem?

github-actions[bot] commented 1 week ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

Azure-Samples / cognitive-services-speech-sdk