Open dmingke opened 1 month ago
Thanks for reporting this issue.
@niuzheng168 could you check?
This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.
hi guys, can anyone have a look at this problem?
This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.
Bug Description
This issue occurs when using the text_stream_sample with the zh-CN-YunxiaNeural voice model, resulting in unintended pauses between words, which disrupts the natural flow of the speech. For instance, when synthesizing the sentence "今天的天气真好", there is a noticeable and unnatural pause between the words "今" and "天", affecting the overall quality of the output. This behavior might be related to how the OpenAI-generated text is processed in chunks, leading to these pauses during real-time synthesis. I am looking for ways to prevent this from happening.
Steps to Reproduce
text_stream_sample
in the repo.speech_synthesis_voice_name
tozh-CN-YunxiaNeural
.audio_buffer.tobytes()
) through the WebSocket and play the PCM audio data on the client-side.Expected Behavior
The synthesized speech should flow smoothly, with no unintended pauses between words unless indicated by appropriate punctuation. Each sentence should be delivered naturally and continuously.
Version of the Cognitive Services Speech SDK
azure-cognitiveservices-speech==1.40.0
Additional Context
zh-CN-YunxiaNeural
andzh-CN-YunxiNeural
, and the issue seems consistent across them.en-US-BrianMultilingualNeural
.