Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.95k stars 1.86k forks source link

Unintended Pauses Between Words When Using text_stream_sample for Chinese TTS #2596

Open dmingke opened 1 month ago

dmingke commented 1 month ago

Bug Description

This issue occurs when using the text_stream_sample with the zh-CN-YunxiaNeural voice model, resulting in unintended pauses between words, which disrupts the natural flow of the speech. For instance, when synthesizing the sentence "今天的天气真好", there is a noticeable and unnatural pause between the words "今" and "天", affecting the overall quality of the output. This behavior might be related to how the OpenAI-generated text is processed in chunks, leading to these pauses during real-time synthesis. I am looking for ways to prevent this from happening.

Steps to Reproduce

  1. Use the framework of the text_stream_sample in the repo.
  2. Set the speech_synthesis_voice_name to zh-CN-YunxiaNeural.
  3. Send the synthesized audio chunks (using audio_buffer.tobytes()) through the WebSocket and play the PCM audio data on the client-side.
  4. Notice that the longer the response, the more frequently unnatural pauses between words occur.

Expected Behavior

The synthesized speech should flow smoothly, with no unintended pauses between words unless indicated by appropriate punctuation. Each sentence should be delivered naturally and continuously.

Version of the Cognitive Services Speech SDK

Additional Context

yulin-li commented 1 month ago

Thanks for reporting this issue.

@niuzheng168 could you check?

github-actions[bot] commented 1 month ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

dmingke commented 1 month ago

hi guys, can anyone have a look at this problem?

github-actions[bot] commented 1 week ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.